Jump to content

Could RegExp be used here ?


Recommended Posts

Hi all,
Just curious, I would like to know if StringRegExp could be used to do same as a script I just completed ?
The script opens an .htm file stored on disk, corresponding to a web source page. In this example it will be the source of https://www.autoitscript.com/site/autoit-script-editor/

Then it searches url's corresponding to .png images in this way :
1) find the 1st string   .png"
2) find the preceding matching double-quote  "
3) extract what's between the 2 double-quotes : now you just got a valid url
4) loop to find others. When finished, display the array (pic below the script) then export to a .txt file

I feel that our RegExp experts could do same in one line, returning the same array of url's using StringRegExp . This RegExp thing always appears like black magic to me :)

Edit: Row 5 in the pic is funny, it shows only ".png" but well... it's same in the source file (just checked)

; 0000000001111111111222222222233333333334444444444555555555566666666667
; 1234567890123456789012345678901234567890123456789012345678901234567890
; ---------"---------.png"----------"-------------------.png"-----------etc...

#include <Array.au3>
#include <File.au3>
#include <MsgBoxConstants.au3>

HotKeySet("{ESC}", "Terminate") ; in case the While loop never ends, who knows...

Global $iReDimWhen = 300, $iRow = 0
Global $aURL[1 + $iReDimWhen] ; avoid using $aURL[0] , making the array 1-based (personal pref)

; "default.htm" is the source of https://www.autoitscript.com/site/autoit-script-editor/
$sFileInput = FileRead(@ScriptDir & "\default.htm")
$iStart = 1 ; search will start from 1st byte in source

$sFileOutput = @ScriptDir & "\default.txt" ; watchout: file will be overwritten without warning

While 1
   ; search for string  .png"  in source
   ; 2 = not case sensitive (faster comparison), 1 = 1st occurrence (left to right)
   ; start searching from pos. $iStart ($iStart will be incremented later)
   $iPosition2 = StringInStr($sFileInput, '.png"', 2, 1, $iStart)
   If $iPosition2 = 0 Then ExitLoop  ; all done

   ; search for a preceding  "  that matches with string  .png"  found just before.
   ; -1 = 1st occurrence (right to left), start from $iPosition2 -1
   ; number of characters to search : don't overlap a preceding double quote already found.
   $iPosition1 = StringInStr($sFileInput, '"', 2, -1, $iPosition2 -1, $iPosition2 - $iStart)

   If $iPosition1 = 0 Then
      MsgBox($MB_SYSTEMMODAL, "Abort", "Missing preceding double quote in source") ; no way !
      Exit
   EndIf

   ; update array of URL's , Redim each 300 rows when necessary
   $iRow += 1  ; 1, 2...
   If $iRow > $iReDimWhen Then
      $iReDimWhen += 300  ; 600, 900...
      ReDim $aURL[1+ $iReDimWhen]
   EndIf

   ; example at the top: if  .png"  found at pos. 20 and preceding  "  found at pos. 10 :
   ; then extract 13 characters from pos. 11 to 23 (without double-quotes)
   $aURL[$iRow] = StringMid($sFileInput, $iPosition1 +1, $iPosition2 - $iPosition1 +3)

   ; next search will start just after the double quote found in  .png" , in our example at pos. 25
   $iStart = $iPosition2 + 5
Wend

If $iRow = 0 Then
   MsgBox($MB_TOPMOST, "Nothing found", "No URL retrieved") ; $MB_SYSTEMMODAL => truncated title :(
   Exit
EndIf

ReDim $aURL[1+ $iRow] ; delete all empty rows up in the array

_ArrayDisplay($aURL, $iRow & " URL(s) retrieved", "1:", 0, Default, "URL")
; "1:" = show rows 1-end  . 0 = align left . Default = user separator (deprecated)

; Write array to a file by passing the file name (file will be overwritten without warning)
_FileWriteFromArray($sFileOutput, $aURL, 1) ; 1 = 1-based (ignore row 0, it's empty anyway)

; Display the file.
ShellExecute($sFileOutput)

; ================================================================================================
Func Terminate()
   HotKeySet("{ESC}")   ; avoid too long or repeated press on Esc
   If MsgBox(BitOr($MB_TOPMOST, $MB_OKCANCEL), "Escape pressed", "End script ?") = $IDOK Then Exit
   HotKeySet("{ESC}", "Terminate")
EndFunc     ; ==> Terminate

5bd7a398b5c8a_CouldRegExpbeusedhere.jpg.fc8dfdb5156d37c83302ad17141db521.jpg

default.htm

Edited by pixelsearch
Link to comment
Share on other sites

2 hours ago, pixelsearch said:

I would like to know if StringRegExp could be used to do same as a script I just completed ?

The answer, which I think you know already, is yes it could be done with a single regular expression.

2 hours ago, pixelsearch said:

This RegExp thing always appears like black magic to me

Don't you think now would be a good time for you to start learning how to write your own so you won't have to keep asking others to do your work for you?  I mean really.  You didn't even make an attempt. 

Link to comment
Share on other sites

This Forum is definitely not a friendly place.
I never asked anyone a single line of code in my life and was just curious to know if it was possible, is it so hard to understand ? So why the rude answer about "asking others to do your work for you?"

My code above works perfectly and you telling me about "others to do work for me" ? Who do you think you are to answer in such a rude way ?
I hope Mods will send you some gentle PM's asking you to cool off, because you're the one that should be educated.

Link to comment
Share on other sites

@pixelsearch @TheXman

Sorry for stepping in this thread, but, every single post that @pixelsearch posted was really kind and he never asked for a piece of code.

From what I could see, @pixelsearch always tried to help ( and helped "concretely" ) a lot of people, always in a gentle way, without "attacking" or judging someone for asking something.

From this post, I see that @pixelsearch is not trying to let others write code for him, but, instead, he just would like to let experts answer to his question: "Could SRE do what my script actually does?".

I don't see him asking for any code, or asking for anything else except for ask to his question.

And, @TheXman, please, don't feel like I am judging you, since I am no one to say who is who.

From a 2 Cent. who a wise person gave us few days ago on here, it would probably be a misunderstanding, since here on the Forum the "feelings" cannot be interpreted, and because everyone write and read text, which is devoid of "feelings".

So, please, don't argue for a thing like that.

@pixelsearch didn't ask for a piece of code, but for a single question, and that's all the misunderstanding.

Have a good day both of you :)

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to comment
Share on other sites

The answer is "yes" :)

As an example:

#include <Array.au3>
#include <File.au3>

$sFileInput = FileRead(@ScriptDir & "\default.htm")

$aHits = StringRegExp($sFileInput, '(?is)"([^"]*?\.png)"', 3)
_ArrayDisplay($aHits)

In theory, the Regex

$aHits = StringRegExp($sFileInput, '(?is)"(.*?\.png)"', 3)

should do the same thing (in my eyes) but it captures the "og:image" tags, too. 

 

Best regards,

Marc

Edited by Marc
removed unneeded lines of code

Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL)

Link to comment
Share on other sites

@FrancescoDiMuro : it's like you're reading my mind, that's a bit scaring :)
Because my one and only thought, when opening the thread, was exactly what you wrote, let me please quote you :

46 minutes ago, FrancescoDiMuro said:

"... to let experts answer to his question: "Could SRE do what my script actually does?"

And many thanks for your appreciations concerning my posts, it means a lot.

@Marc : you did it !
I was just curious to know if it was possible, because it seems so complicated, an endless text file with so many double quotes in it. It was like some kind of challenge sent to the RegExp community and you solved it so easily, bravo.

Have a great day both of you :)

Link to comment
Share on other sites

1 hour ago, pixelsearch said:

It was like some kind of challenge sent to the RegExp community

It will try... to disturb @mikell to do with him a very very clear and easy exemple topic about SRE. If he is ok. ( I dont asked him yet :sweating: ) He will be surprise hah.

 

I had that in mind since long.

 

@TheXman No one asked you to do anything... If you disagree with someone go along you path. There are already a lot of good moderator to do what is needed to do here. 

Edited by caramen

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

6 hours ago, pixelsearch said:

Could RegExp be used here ?

Yes you can  :)
For example, there is a common way to do that using a character class : [^"]+?  which means : "one or more non-quote characters, lazy"
So the expression looks like this :

#include <Array.au3>

$url = "https://www.autoitscript.com/site/autoit-script-editor/"
$s = BinaryToString(InetRead($url))

$a = StringRegExp($s, '(?i)(?:https|/autoit3)[^"]+?\.png', 3)
$a = _ArrayUnique($a)
_ArrayDisplay($a)

But there are many ways to skin this cat  :)

Edited by mikell
added _ArrayUnique
Link to comment
Share on other sites

  • 1 month later...

Hi all :)
Please let me ask the final question right now, then the (very long !) explanation that led me here :
Question : why can't I write binary into a .txt file when it works into an .ini file ?

Since I created this thread, I started to learn (slowly) RegExp and downloaded Lazycat's (co-writer of Koda) great script named RegExpQuickTester25.zip, found here :
https://www.autoitscript.com/forum/topic/27025-regexp-quick-tester/

regexp.jpg.1daaa0869037559ceb3300a5602388d3.jpg

The preceding pic corresponds to what has been discussed in this thread (see 1st post), now using RegExp Tester :
* 1st Edit control "Match text" contains the pasted file "default.htm" (48Kb) attached in 1st post
* 2nd Edit control "Search pattern" contains Marc's 1st regexp  (?is)"([^"]*?\.png)"

You may notice that I did paste a 48Kb file into an Edit Control, but I had to modify the script for that, because Edit Controls can't natively accept more than 30.000 characters. So I added a line that made it :

GUICtrlSendMsg($ebTest, $EM_LIMITTEXT, -1, 0) ; added 5 nov 2018 to allow unlimited text size in edit control (but problems to come in .ini file)

I also modified Lazycat's script to accept "drag and drop" of any text file's content into the "Match text" Edit Control . To do this, 3 parts of code were added :

* $WS_EX_ACCEPTFILES as extended style in GUICreate()
* $GUI_DROPACCEPTED added for the "Match text" Edit Control
* $GUI_EVENT_DROPPED added and tested during GUIGetMsg()

All this works fine (plus other modifications) and size is no more an issue with "Match Text" Edit control. Now the problems arrive...

Lazycat uses an .ini file to store everything you see in the precedent pic (except the automatic results of course). That ini file is read each time the script is run, in order to fill automatically all controls with what was left when leaving the precedent session. Then the ini file is written when the script is ended.

He didn't have problems with the ini file section size because he wrote "...Here is my version. It's concept is a bit different, this is not work with big files..."

But BrewManNH reminds us that "The only 32K limit that applies to INI files is reading a section", in this link :
https://www.autoitscript.com/forum/topic/145362-ini-file-storage-capacity/?do=findComment&comment=1027325

With the 32K limit indicated by BrewManNH, I can't use this ini file "as-is" in case the "Match text" Edit Control contains a "big" file (default.htm is a good example, it's size is 48Kb) because when read, it will truncate the "Match text" Edit Control contents, very bad !

Here are the 2 parts of code concerning the ini file, in Lazycat's code : please note the use of BinaryToString() during the Read process, then Binary() during the Write process. I don't know why exactly he decided to choose binary instead of plain text, maybe one of our readers will know :)

; Reload recent data
$nMode = Number(IniRead($sIniFile, "Main", "Mode", 0))
GUICtrlSendMsg($cbMode, $CB_SETCURSEL, $nMode, 0)
GUICtrlSetState($cbLineNum, IniRead($sIniFile, "Main", "LineNum", 1))
GUICtrlSetData($ibRepCount, IniRead($sIniFile, "Main", "ReplaceCount", 0))
GUICtrlSetData($ebTest, BinaryToString(IniRead($sIniFile, "Main", "Text", "")))
GUICtrlSetData($ebRegExp, BinaryToString(IniRead($sIniFile, "Main", "Pattern", "")))
GUICtrlSetData($ebRegExpReplace, BinaryToString(IniRead($sIniFile, "Main", "Replace", "")))

; Write
Case $GUI_EVENT_CLOSE, $btnClose
IniWrite($sIniFile, "Main", "Mode", $nMode)
IniWrite($sIniFile, "Main", "LineNum", GUICtrlRead($cbLineNum))
IniWrite($sIniFile, "Main", "ReplaceCount", GUICtrlRead($ibRepCount))
IniWrite($sIniFile, "Main", "Pattern", Binary(GUICtrlRead($ebRegExp)))
IniWrite($sIniFile, "Main", "Replace", Binary(GUICtrlRead($ebRegExpReplace)))
IniWrite($sIniFile, "Main", "Text", Binary(GUICtrlRead($ebTest)))

Here is how the ini file, opened with Notepad, appears on my computer : please note how the binary parts are correctly written into it :

[Main]
Mode=3
LineNum=1
ReplaceCount=0
Pattern=0x283F69732922285B5E225D2A3F5C2E706E672922
Replace=
Text=0x3C21444F43545950452068746D6C3E0D0A3C212D2D5B696620494520365D3E0D0A3C68746D6C2069643D2269653622206C616E6.... very long string ....

So now, what I'm working on is to keep the ini file for everything... except for the "Match text" Edit Control !

"Match text" Edit Control's content will be written into an additional txt file. This seems to work when I write the txt file as plain text, because no matter how many tests I did yesterday, I couldn't write in binary into a txt file, no matter I opened the file with write + binary flags etc... And even after having tested mLipok's script here :
https://www.autoitscript.com/forum/topic/167782-write-binary-data/?do=findComment&comment=1228244

binary.jpg.515052597d7d408f66fbaacfc4b7781d.jpg

I run mLipok's script, type 123 in the InputBox and when I open the test.txt file, what do I see in it ?
123 in plain text :blink:

If I did same using Lazycat's script, typing 123 as Search Pattern, the content of the ini file would show this :
Pattern=0x313233 , correct !

So the question is : why is this Binary thing working into an ini file and not into txt files (at least for me) ?

Of course I don't need binary at all and everything could be written in plain text but it's very frustrating not to understand why it's not working !

Now in case I won't keep any Binary in the ini file, I'll have to manage the "whitespaces" (maybe left on purpose), for example in the Pattern Search Control (thx to the help file, IniWrite topic, mentioning it)  by adding double quotes surrounding the string : chr(34) does the job after testing... you better not forget them or your whitespaces are gone !

; IniWrite($sIniFile, "Main", "Pattern", Binary(GUICtrlRead($ebRegExp)))
IniWrite($sIniFile, "Main", "Pattern", chr(34) & GUICtrlRead($ebRegExp) & chr(34))

Maybe that's the reason why Lazycat used (successfully) Binary ? To avoid losing mandatory whitespaces ?
I don't think he did it for some security reasons, I guess we'll never know...

When I'll be satisfied with the modified script, I'll share it in the Examples section of the Forum, maybe this reworked version could suit some readers, who knows ?

For those who didn't fall asleep in the middle of this looong post, thanks for reading... probably one of the longest post in this Forum :D

Link to comment
Share on other sites

  • Moderators

@pixelsearch in reading through this thread let me say that you are not alone in avoiding regex; it still makes my eyes bleed to this day. As to the friendliness of the forum, I would say that you have gotten great suggestions from Marc and mikell - focus on those. Unfortunately you have some folks as you saw above that seem intent only on proving what an ass they can be without adding anything of value to the forum; these you have to ignore :)

As to this probably being one of the longest posts on the forum, not even close. Look in the Chat section for some meandering word walls that should really be broken up into chapters :)

 

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

I am not a regex master, probably not much more than novice.  But I have learned it and I do use it, and doing so helped me with a lot of little automatons.

Besides googling every "learn regex" site I could find, and asking questions here as needed, the one resource that helped me the most (and I still use almost every time I need to write any regex) is this one:

https://regex101.com/

It tells you what the regex is trying to do, and shows you live how it is doing it.

Link to comment
Share on other sites

This site also offers debugging a pattern step by step.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...