qwert Posted June 6, 2019 Share Posted June 6, 2019 I've been working to fashion a RegEx command to remove all "<script>" declarations in a page of html. The RegEx Toolkit has been a great help ... but I'm at a point that my knowledge of RegEx just runs out. Can someone explain why the expression in this example doesn't find (and replace) both of the script statements? Thanks in advance for any help. Link to comment Share on other sites More sharing options...
FrancescoDiMuro Posted June 6, 2019 Share Posted June 6, 2019 @qwert Something like this: #include <MsgBoxConstants.au3> #include <StringConstants.au3> Global $strString = 'This is HTML!' & @CRLF & _ '<script>' & @CRLF & _ 'document.getElementById("demo").innerHTML = "Hello JavaScript!";' & @CRLF & _ '</script>' & @CRLF & _ 'This is more HTML!' MsgBox($MB_ICONINFORMATION, "Before:", $strString) $strString = StringRegExpReplace($strString, '<script>[^<]*<\/script>', '[Replaced]') MsgBox($MB_ICONINFORMATION, "After:", $strString) Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette Link to comment Share on other sites More sharing options...
Developers Jos Posted June 6, 2019 Developers Share Posted June 6, 2019 It always helps when you post a scriptlet with the input data and current source to play and modify. Jis FrancescoDiMuro 1 SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
qwert Posted June 6, 2019 Author Share Posted June 6, 2019 @Francesco: Well, almost! When I try it in RegEx Toolkit, it replaces the first occurrence, but not the second. Quote This is a test. [replaced] <script src='https://www.google.com/recaptcha/api.js'></script> </head> <body class="with-hero "> Now is the time. Any ideas? Link to comment Share on other sites More sharing options...
qwert Posted June 6, 2019 Author Share Posted June 6, 2019 I think I found it. The first element was defined to have a hard ">" as it's last character. I changed to this and it works for both cases: <script[^<]*<\/script> I appreciate your help! @Jos: yes, I see the value in doing that. But since I was already using the Toolkit, I thought there might be a benefit in showing others that it exists. Link to comment Share on other sites More sharing options...
mikell Posted June 6, 2019 Share Posted June 6, 2019 To have the dot match newlines, you need the (?s) option Try this : $strString = StringRegExpReplace($strString, '(?s)<script.*?</script>', '[Replaced]') Link to comment Share on other sites More sharing options...
qwert Posted June 6, 2019 Author Share Posted June 6, 2019 @mikell: Yes, indeed. When I tried the first expression in my full script, it only worked for scripts on a single line. The expression you provided caught all occurrences. Thanks for posting. I appreciate your help. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now