pixelsearch Posted October 30, 2018 Share Posted October 30, 2018 (edited) Hi all, Just curious, I would like to know if StringRegExp could be used to do same as a script I just completed ? The script opens an .htm file stored on disk, corresponding to a web source page. In this example it will be the source of https://www.autoitscript.com/site/autoit-script-editor/ Then it searches url's corresponding to .png images in this way : 1) find the 1st string .png" 2) find the preceding matching double-quote " 3) extract what's between the 2 double-quotes : now you just got a valid url 4) loop to find others. When finished, display the array (pic below the script) then export to a .txt file I feel that our RegExp experts could do same in one line, returning the same array of url's using StringRegExp . This RegExp thing always appears like black magic to me Edit: Row 5 in the pic is funny, it shows only ".png" but well... it's same in the source file (just checked) expandcollapse popup; 0000000001111111111222222222233333333334444444444555555555566666666667 ; 1234567890123456789012345678901234567890123456789012345678901234567890 ; ---------"---------.png"----------"-------------------.png"-----------etc... #include <Array.au3> #include <File.au3> #include <MsgBoxConstants.au3> HotKeySet("{ESC}", "Terminate") ; in case the While loop never ends, who knows... Global $iReDimWhen = 300, $iRow = 0 Global $aURL[1 + $iReDimWhen] ; avoid using $aURL[0] , making the array 1-based (personal pref) ; "default.htm" is the source of https://www.autoitscript.com/site/autoit-script-editor/ $sFileInput = FileRead(@ScriptDir & "\default.htm") $iStart = 1 ; search will start from 1st byte in source $sFileOutput = @ScriptDir & "\default.txt" ; watchout: file will be overwritten without warning While 1 ; search for string .png" in source ; 2 = not case sensitive (faster comparison), 1 = 1st occurrence (left to right) ; start searching from pos. $iStart ($iStart will be incremented later) $iPosition2 = StringInStr($sFileInput, '.png"', 2, 1, $iStart) If $iPosition2 = 0 Then ExitLoop ; all done ; search for a preceding " that matches with string .png" found just before. ; -1 = 1st occurrence (right to left), start from $iPosition2 -1 ; number of characters to search : don't overlap a preceding double quote already found. $iPosition1 = StringInStr($sFileInput, '"', 2, -1, $iPosition2 -1, $iPosition2 - $iStart) If $iPosition1 = 0 Then MsgBox($MB_SYSTEMMODAL, "Abort", "Missing preceding double quote in source") ; no way ! Exit EndIf ; update array of URL's , Redim each 300 rows when necessary $iRow += 1 ; 1, 2... If $iRow > $iReDimWhen Then $iReDimWhen += 300 ; 600, 900... ReDim $aURL[1+ $iReDimWhen] EndIf ; example at the top: if .png" found at pos. 20 and preceding " found at pos. 10 : ; then extract 13 characters from pos. 11 to 23 (without double-quotes) $aURL[$iRow] = StringMid($sFileInput, $iPosition1 +1, $iPosition2 - $iPosition1 +3) ; next search will start just after the double quote found in .png" , in our example at pos. 25 $iStart = $iPosition2 + 5 Wend If $iRow = 0 Then MsgBox($MB_TOPMOST, "Nothing found", "No URL retrieved") ; $MB_SYSTEMMODAL => truncated title :( Exit EndIf ReDim $aURL[1+ $iRow] ; delete all empty rows up in the array _ArrayDisplay($aURL, $iRow & " URL(s) retrieved", "1:", 0, Default, "URL") ; "1:" = show rows 1-end . 0 = align left . Default = user separator (deprecated) ; Write array to a file by passing the file name (file will be overwritten without warning) _FileWriteFromArray($sFileOutput, $aURL, 1) ; 1 = 1-based (ignore row 0, it's empty anyway) ; Display the file. ShellExecute($sFileOutput) ; ================================================================================================ Func Terminate() HotKeySet("{ESC}") ; avoid too long or repeated press on Esc If MsgBox(BitOr($MB_TOPMOST, $MB_OKCANCEL), "Escape pressed", "End script ?") = $IDOK Then Exit HotKeySet("{ESC}", "Terminate") EndFunc ; ==> Terminate default.htm Edited October 30, 2018 by pixelsearch Link to comment Share on other sites More sharing options...
TheXman Posted October 30, 2018 Share Posted October 30, 2018 2 hours ago, pixelsearch said: I would like to know if StringRegExp could be used to do same as a script I just completed ? The answer, which I think you know already, is yes it could be done with a single regular expression. 2 hours ago, pixelsearch said: This RegExp thing always appears like black magic to me Don't you think now would be a good time for you to start learning how to write your own so you won't have to keep asking others to do your work for you? I mean really. You didn't even make an attempt. CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
pixelsearch Posted October 30, 2018 Author Share Posted October 30, 2018 This Forum is definitely not a friendly place. I never asked anyone a single line of code in my life and was just curious to know if it was possible, is it so hard to understand ? So why the rude answer about "asking others to do your work for you?" My code above works perfectly and you telling me about "others to do work for me" ? Who do you think you are to answer in such a rude way ? I hope Mods will send you some gentle PM's asking you to cool off, because you're the one that should be educated. IAMK 1 Link to comment Share on other sites More sharing options...
TheXman Posted October 30, 2018 Share Posted October 30, 2018 CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
FrancescoDiMuro Posted October 30, 2018 Share Posted October 30, 2018 @pixelsearch @TheXman Sorry for stepping in this thread, but, every single post that @pixelsearch posted was really kind and he never asked for a piece of code. From what I could see, @pixelsearch always tried to help ( and helped "concretely" ) a lot of people, always in a gentle way, without "attacking" or judging someone for asking something. From this post, I see that @pixelsearch is not trying to let others write code for him, but, instead, he just would like to let experts answer to his question: "Could SRE do what my script actually does?". I don't see him asking for any code, or asking for anything else except for ask to his question. And, @TheXman, please, don't feel like I am judging you, since I am no one to say who is who. From a 2 Cent. who a wise person gave us few days ago on here, it would probably be a misunderstanding, since here on the Forum the "feelings" cannot be interpreted, and because everyone write and read text, which is devoid of "feelings". So, please, don't argue for a thing like that. @pixelsearch didn't ask for a piece of code, but for a single question, and that's all the misunderstanding. Have a good day both of you caramen and pixelsearch 2 Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette Link to comment Share on other sites More sharing options...
Marc Posted October 30, 2018 Share Posted October 30, 2018 (edited) The answer is "yes" As an example: #include <Array.au3> #include <File.au3> $sFileInput = FileRead(@ScriptDir & "\default.htm") $aHits = StringRegExp($sFileInput, '(?is)"([^"]*?\.png)"', 3) _ArrayDisplay($aHits) In theory, the Regex $aHits = StringRegExp($sFileInput, '(?is)"(.*?\.png)"', 3) should do the same thing (in my eyes) but it captures the "og:image" tags, too. Best regards, Marc Edited October 30, 2018 by Marc removed unneeded lines of code pixelsearch 1 Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL) Link to comment Share on other sites More sharing options...
pixelsearch Posted October 30, 2018 Author Share Posted October 30, 2018 @FrancescoDiMuro : it's like you're reading my mind, that's a bit scaring Because my one and only thought, when opening the thread, was exactly what you wrote, let me please quote you : 46 minutes ago, FrancescoDiMuro said: "... to let experts answer to his question: "Could SRE do what my script actually does?" And many thanks for your appreciations concerning my posts, it means a lot. @Marc : you did it ! I was just curious to know if it was possible, because it seems so complicated, an endless text file with so many double quotes in it. It was like some kind of challenge sent to the RegExp community and you solved it so easily, bravo. Have a great day both of you FrancescoDiMuro 1 Link to comment Share on other sites More sharing options...
caramen Posted October 30, 2018 Share Posted October 30, 2018 (edited) 1 hour ago, pixelsearch said: It was like some kind of challenge sent to the RegExp community It will try... to disturb @mikell to do with him a very very clear and easy exemple topic about SRE. If he is ok. ( I dont asked him yet ) He will be surprise hah. I had that in mind since long. @TheXman No one asked you to do anything... If you disagree with someone go along you path. There are already a lot of good moderator to do what is needed to do here. Edited October 30, 2018 by caramen My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
mikell Posted October 30, 2018 Share Posted October 30, 2018 (edited) 6 hours ago, pixelsearch said: Could RegExp be used here ? Yes you can For example, there is a common way to do that using a character class : [^"]+? which means : "one or more non-quote characters, lazy" So the expression looks like this : #include <Array.au3> $url = "https://www.autoitscript.com/site/autoit-script-editor/" $s = BinaryToString(InetRead($url)) $a = StringRegExp($s, '(?i)(?:https|/autoit3)[^"]+?\.png', 3) $a = _ArrayUnique($a) _ArrayDisplay($a) But there are many ways to skin this cat Edited October 30, 2018 by mikell added _ArrayUnique pixelsearch 1 Link to comment Share on other sites More sharing options...
pixelsearch Posted November 30, 2018 Author Share Posted November 30, 2018 Hi all Please let me ask the final question right now, then the (very long !) explanation that led me here : Question : why can't I write binary into a .txt file when it works into an .ini file ? Since I created this thread, I started to learn (slowly) RegExp and downloaded Lazycat's (co-writer of Koda) great script named RegExpQuickTester25.zip, found here :https://www.autoitscript.com/forum/topic/27025-regexp-quick-tester/ The preceding pic corresponds to what has been discussed in this thread (see 1st post), now using RegExp Tester : * 1st Edit control "Match text" contains the pasted file "default.htm" (48Kb) attached in 1st post * 2nd Edit control "Search pattern" contains Marc's 1st regexp (?is)"([^"]*?\.png)" You may notice that I did paste a 48Kb file into an Edit Control, but I had to modify the script for that, because Edit Controls can't natively accept more than 30.000 characters. So I added a line that made it : GUICtrlSendMsg($ebTest, $EM_LIMITTEXT, -1, 0) ; added 5 nov 2018 to allow unlimited text size in edit control (but problems to come in .ini file) I also modified Lazycat's script to accept "drag and drop" of any text file's content into the "Match text" Edit Control . To do this, 3 parts of code were added : * $WS_EX_ACCEPTFILES as extended style in GUICreate() * $GUI_DROPACCEPTED added for the "Match text" Edit Control * $GUI_EVENT_DROPPED added and tested during GUIGetMsg() All this works fine (plus other modifications) and size is no more an issue with "Match Text" Edit control. Now the problems arrive... Lazycat uses an .ini file to store everything you see in the precedent pic (except the automatic results of course). That ini file is read each time the script is run, in order to fill automatically all controls with what was left when leaving the precedent session. Then the ini file is written when the script is ended. He didn't have problems with the ini file section size because he wrote "...Here is my version. It's concept is a bit different, this is not work with big files..." But BrewManNH reminds us that "The only 32K limit that applies to INI files is reading a section", in this link :https://www.autoitscript.com/forum/topic/145362-ini-file-storage-capacity/?do=findComment&comment=1027325 With the 32K limit indicated by BrewManNH, I can't use this ini file "as-is" in case the "Match text" Edit Control contains a "big" file (default.htm is a good example, it's size is 48Kb) because when read, it will truncate the "Match text" Edit Control contents, very bad ! Here are the 2 parts of code concerning the ini file, in Lazycat's code : please note the use of BinaryToString() during the Read process, then Binary() during the Write process. I don't know why exactly he decided to choose binary instead of plain text, maybe one of our readers will know ; Reload recent data $nMode = Number(IniRead($sIniFile, "Main", "Mode", 0)) GUICtrlSendMsg($cbMode, $CB_SETCURSEL, $nMode, 0) GUICtrlSetState($cbLineNum, IniRead($sIniFile, "Main", "LineNum", 1)) GUICtrlSetData($ibRepCount, IniRead($sIniFile, "Main", "ReplaceCount", 0)) GUICtrlSetData($ebTest, BinaryToString(IniRead($sIniFile, "Main", "Text", ""))) GUICtrlSetData($ebRegExp, BinaryToString(IniRead($sIniFile, "Main", "Pattern", ""))) GUICtrlSetData($ebRegExpReplace, BinaryToString(IniRead($sIniFile, "Main", "Replace", ""))) ; Write Case $GUI_EVENT_CLOSE, $btnClose IniWrite($sIniFile, "Main", "Mode", $nMode) IniWrite($sIniFile, "Main", "LineNum", GUICtrlRead($cbLineNum)) IniWrite($sIniFile, "Main", "ReplaceCount", GUICtrlRead($ibRepCount)) IniWrite($sIniFile, "Main", "Pattern", Binary(GUICtrlRead($ebRegExp))) IniWrite($sIniFile, "Main", "Replace", Binary(GUICtrlRead($ebRegExpReplace))) IniWrite($sIniFile, "Main", "Text", Binary(GUICtrlRead($ebTest))) Here is how the ini file, opened with Notepad, appears on my computer : please note how the binary parts are correctly written into it : [Main] Mode=3 LineNum=1 ReplaceCount=0 Pattern=0x283F69732922285B5E225D2A3F5C2E706E672922 Replace= Text=0x3C21444F43545950452068746D6C3E0D0A3C212D2D5B696620494520365D3E0D0A3C68746D6C2069643D2269653622206C616E6.... very long string .... So now, what I'm working on is to keep the ini file for everything... except for the "Match text" Edit Control ! "Match text" Edit Control's content will be written into an additional txt file. This seems to work when I write the txt file as plain text, because no matter how many tests I did yesterday, I couldn't write in binary into a txt file, no matter I opened the file with write + binary flags etc... And even after having tested mLipok's script here :https://www.autoitscript.com/forum/topic/167782-write-binary-data/?do=findComment&comment=1228244 I run mLipok's script, type 123 in the InputBox and when I open the test.txt file, what do I see in it ? 123 in plain text If I did same using Lazycat's script, typing 123 as Search Pattern, the content of the ini file would show this : Pattern=0x313233 , correct ! So the question is : why is this Binary thing working into an ini file and not into txt files (at least for me) ? Of course I don't need binary at all and everything could be written in plain text but it's very frustrating not to understand why it's not working ! Now in case I won't keep any Binary in the ini file, I'll have to manage the "whitespaces" (maybe left on purpose), for example in the Pattern Search Control (thx to the help file, IniWrite topic, mentioning it) by adding double quotes surrounding the string : chr(34) does the job after testing... you better not forget them or your whitespaces are gone ! ; IniWrite($sIniFile, "Main", "Pattern", Binary(GUICtrlRead($ebRegExp))) IniWrite($sIniFile, "Main", "Pattern", chr(34) & GUICtrlRead($ebRegExp) & chr(34)) Maybe that's the reason why Lazycat used (successfully) Binary ? To avoid losing mandatory whitespaces ? I don't think he did it for some security reasons, I guess we'll never know... When I'll be satisfied with the modified script, I'll share it in the Examples section of the Forum, maybe this reworked version could suit some readers, who knows ? For those who didn't fall asleep in the middle of this looong post, thanks for reading... probably one of the longest post in this Forum Link to comment Share on other sites More sharing options...
Moderators JLogan3o13 Posted November 30, 2018 Moderators Share Posted November 30, 2018 @pixelsearch in reading through this thread let me say that you are not alone in avoiding regex; it still makes my eyes bleed to this day. As to the friendliness of the forum, I would say that you have gotten great suggestions from Marc and mikell - focus on those. Unfortunately you have some folks as you saw above that seem intent only on proving what an ass they can be without adding anything of value to the forum; these you have to ignore As to this probably being one of the longest posts on the forum, not even close. Look in the Chat section for some meandering word walls that should really be broken up into chapters FrancescoDiMuro 1 "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum! Link to comment Share on other sites More sharing options...
ViciousXUSMC Posted December 3, 2018 Share Posted December 3, 2018 I am not a regex master, probably not much more than novice. But I have learned it and I do use it, and doing so helped me with a lot of little automatons. Besides googling every "learn regex" site I could find, and asking questions here as needed, the one resource that helped me the most (and I still use almost every time I need to write any regex) is this one: https://regex101.com/ It tells you what the regex is trying to do, and shows you live how it is doing it. Link to comment Share on other sites More sharing options...
jchd Posted December 3, 2018 Share Posted December 3, 2018 This site also offers debugging a pattern step by step. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
pixelsearch Posted December 3, 2018 Author Share Posted December 3, 2018 Thanks all for the link & advices Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now