lostandconfused Posted January 31, 2016 Share Posted January 31, 2016 Hello, Here is my situation. Most of the lines in the text document start with a word that is needed ("neededword"). Some lines start with another word ("nonneededword") and I would like to delete all of those lines. Here is what I have so far: #include <String.au3> #include <MsgBoxConstants.au3> #include <File.au3> #include <AutoItConstants.au3> #Include <Array.au3> Local $read = FileRead(@ScriptDir & "\input.txt") ;read file Local $string2 = StringRegExpReplace($read, "(?U)(?i)(?s)nonneededword.*neededword", "neededword") FileWrite(@ScriptDir & '\output.txt', $string2) At this time the script above eliminates most lines that start with nonneeded word, except for two situations: 1. When two lines in a row start with a nonneeded word, only the first one is deleted. 2. If the last line of the text file starts with a nonneeded word, it is not deleted. Can anyone assist me by pointing in the right direction? Thank you! Link to comment Share on other sites More sharing options...
MilesAhead Posted January 31, 2016 Share Posted January 31, 2016 My inclination, unless the file is huge, would be to read it in using _FileReadToArray. Then loop through the array. You can either delete the array entries with the bad word, then write the file using _FileWriteFromArray or just loop through the array writing to the file a line at a time skipping those array elements that have the bad word. The first way is probably both more efficient and easier to code. See help file for examples for those two functions. My Freeware Page Link to comment Share on other sites More sharing options...
JohnOne Posted January 31, 2016 Share Posted January 31, 2016 Is that regex removing whole lines, looks like it's just replacing words? Might as well use StringReplace. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
lostandconfused Posted January 31, 2016 Author Share Posted January 31, 2016 16 minutes ago, MilesAhead said: My inclination, unless the file is huge, would be to read it in using _FileReadToArray. Then loop through the array. You can either delete the array entries with the bad word, then write the file using _FileWriteFromArray or just loop through the array writing to the file a line at a time skipping those array elements that have the bad word. The first way is probably both more efficient and easier to code. See help file for examples for those two functions. Trying this now. I'm not really good with loops. Can someone show me an example please? 13 minutes ago, JohnOne said: Is that regex removing whole lines, looks like it's just replacing words? Might as well use StringReplace. It is currently removing whole lines that start with "nonneededword" if the next line starts with "neededword". The issue is it is currently also deleting "neededword" from the start of the next line, so I had to tell it to re-add the "neededword" after the delete. Will try your solution as well. Thank you both! Link to comment Share on other sites More sharing options...
czardas Posted January 31, 2016 Share Posted January 31, 2016 (edited) Newline syntax in regexp is quite confusing. Anyway this seems to work okay with CRLF as the newline. This can easily be modified. Local $sText = 'never compromize principles' & @CRLF & _ 'keep this' & @CRLF & _ 'nope get shut' & @CRLF & _ 'nowhere else' & @CRLF & _ 'keep not delete' & @CRLF Local $sAvoid = 'no|not|nope|never|nah' Local $sRegExp = '((?m)^' & StringReplace($sAvoid, '|','\b.+(\r\n)?|^') & '\b.+(\r\n)?)' MsgBox(0, "", StringRegExpReplace($sText, $sRegExp, '')) Edited January 31, 2016 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
iamtheky Posted January 31, 2016 Share Posted January 31, 2016 (edited) similar #include<array.au3> Local $sText = 'keep never compromize principles' & @CRLF & _ 'keep this' & @CRLF & _ 'not keep' & @CRLF & _ 'keep' & @CRLF & _ 'keep not delete' & @CRLF Local $sNeededWord = 'keep' $aMatch = stringregexp($sText , "(?:\A|\n)(" & $sNeededWord & "\s*.*?)\r" , 3) msgbox(0, '' , _ArrayToString($aMatch, @CRLF)) Edited January 31, 2016 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
Malkey Posted January 31, 2016 Share Posted January 31, 2016 This example deletes all the lines that do not start with any of the words in $sNeededWords. Local $sText = 'Keep never compromize principles.' & @CRLF & _ 'And delete this line if "and" is not needed word.' & @CRLF & _ 'Keep this' & @CRLF & _ 'not keep this line' & @CRLF & _ 'nor this line' & @CRLF & _ 'keep' & @CRLF & _ 'keeper not keep, so delete this line.'; & @CRLF Local $sNeededWords = 'keep|and' ; When more than one needed word, separate the words with "|". $aMatch = StringRegExpReplace($sText, "(?im)^(?!(" & $sNeededWords & ")\b).*\R?", "") ; Delete the lines that do not start with $sNeededWords - "keep" and "and". MsgBox(0, '"keep" & "and"', $aMatch) czardas 1 Link to comment Share on other sites More sharing options...
mikell Posted January 31, 2016 Share Posted January 31, 2016 If there is only one "nonneededword" the only change to do in your code from post #1 is this Local $string2 = StringRegExpReplace($read, "(?im)^nonneededword.*\R?", "") If there are several the regex can be adapted very easily using an alternation as showed in the previous codes czardas 1 Link to comment Share on other sites More sharing options...
czardas Posted January 31, 2016 Share Posted January 31, 2016 (edited) The last two examples are much better than my attempt. When I used \R (for the first time mind), I couldn't get it to work. Edited January 31, 2016 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
mikell Posted January 31, 2016 Share Posted January 31, 2016 Yes, \R is very useful as it matches any newline sequence - though for the same reason (non fixed length) it can't be used in a lookbehind, limitation which should be pointed out in the helpfile Link to comment Share on other sites More sharing options...
lostandconfused Posted January 31, 2016 Author Share Posted January 31, 2016 Thank you everyone for the awesome help! This forum is great! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now