IanN1990 Posted December 20, 2019 Share Posted December 20, 2019 (edited) Hi all, I am trying to read a csv and remove any line that does not contain "File Location" or "Date Submitted" in column 4 Using the helpfile I was about to produce this; $string = StringRegExpReplace($String, '\r\n.*,.*,.*,("File Location"|Date Submitted").*', "") Irony is, it works perfectly in the opposite way i need and removes any line with "File Location" / "Date Submitted" in colum 4. I can't figure out how to invert it. Reading the helpfile it mentions ?! but no matter of combination I try the outcome is the same. Any help is appreciated! Ian Edited March 6, 2020 by IanN1990 Link to comment Share on other sites More sharing options...
FrancescoDiMuro Posted December 20, 2019 Share Posted December 20, 2019 @IanN1990 You could use a pattern like this: '^.*(?<!File Location|Date Submitted)"$' Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette Link to comment Share on other sites More sharing options...
jchd Posted December 20, 2019 Share Posted December 20, 2019 (edited) Look-behind needs to have a fixed number of characters in the first-level alternatives, so that won't work as is. '(?m)(^(?:(?!"File Location"|"Date Submitted").)*(?:\R|$))' should work (untested) Edited December 20, 2019 by jchd IanN1990 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted December 20, 2019 Share Posted December 20, 2019 ... or the usual way "match what you want to keep and fire the rest" $txt = StringRegExpReplace($String, '(?m)^.*("File Location"|"Date Submitted").*(*SKIP)(*F)|^.*\R?', "") IanN1990 1 Link to comment Share on other sites More sharing options...
IanN1990 Posted December 20, 2019 Author Share Posted December 20, 2019 (edited) Always amazes me how people come up with these regular expressions! I wish a had a mind for it @FrancescoDiMuro Unfortunately i couldn't get yours to work @jchd and @mikell both worked though jcdh was slightly faster, processing 150k lines in <4 seconds vs 4.5! a photo finish Though i noticed it was picking up some false positives. 08/01/2019 17:08:54,"Test","IanN.1990","File Location","C:\temp\121312.log","File Location" <-- Correct 08/01/2019 17:08:54,"Test","IanN.1990","Date Reviewed","C:\temp\121312.log","File Location" <-- Incorrect as column 4 = Date Reviewed I attempted to correct this by adding in my earlier code .*,.*, with my logic being; Match anything until , then match anything until , then look for my phase but that would en-scope the false positive. I am guessing i need a way for it only to look between the second and third , ? Ian Edited December 20, 2019 by IanN1990 Link to comment Share on other sites More sharing options...
jchd Posted December 20, 2019 Share Posted December 20, 2019 Of course the pattern posted using .* doesn't fit a precise bill. If you have text fields in between (before the 4th) then it complicate matters significantly. One can't rely on comma separators (text may contain commas) nor double quotes (text may contain escaped double quotes. OTOH if your data always has a fixed number of fields, type and structure --somehow like the two lines above-- then it's possible to filter false positives out. Beware that by default, .* is greedy! Make that .*? to turn it lazy or use (?U) wisely. IanN1990 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted December 20, 2019 Share Posted December 20, 2019 Column 4, OK. If the csv is comma separated (and some fields may be empty) then something like this should work $txt = StringRegExpReplace($String, '(?m)^([^,]*,){3}("File Location"|"Date Submitted").*(*SKIP)(*F)|^.*\R?', "") IanN1990 1 Link to comment Share on other sites More sharing options...
IanN1990 Posted December 20, 2019 Author Share Posted December 20, 2019 (edited) @mikell Your code works perfectly! and it completes all 15k rows in 500ms! Having the power of RegEx is like being a wizard! @jchd You are correct in everything you said, in this example i am quite lucky where the first four columns are generated pragmatically . So there wouldn't be a risk of the issues you mentioned. Thank you all for your help on this issue! It's much appricated Edited December 20, 2019 by IanN1990 Link to comment Share on other sites More sharing options...
IanN1990 Posted February 13, 2020 Author Share Posted February 13, 2020 (edited) @mikell Your code has served me faithfully but has recently run into a slight issue. The lastest CSV sent by management can now have dual-roles. Where each role is separated by a , Is there a way the Regex could be changed to ignore , inside of quotes? The example below shows when i am looking match column 4 but an item is missed. $CSVString = '1,Enabled,"Mr,Test",Location' & @CRLF & '2,Enabled,"Mr Test",Location' & @CRLF $Result1 = StringRegExpReplace($CSVString, '(?m)^([^,]*,){' & 3 & '}(' & 'Location' & ').*(*SKIP)(*F)|^.*\R?', "") ConsoleWrite($Result1 & @CRLF & @CRLF) $Result2 = StringRegExpReplace($CSVString, '(?m)^([^,]*,){' & 4 & '}(' & 'Location' & ').*(*SKIP)(*F)|^.*\R?', "") ConsoleWrite($Result2 & @CRLF) Edited February 13, 2020 by IanN1990 Link to comment Share on other sites More sharing options...
Malkey Posted February 13, 2020 Share Posted February 13, 2020 (edited) Here are a couple of RE patterns that appear to work. $CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & _ '2,Enabled,"Mr,Test",Location,Test' ;$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){3}(Location).*(*SKIP)(*F)|^.*\R?', "") $Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){3}+(Location).*(*SKIP)(*F)|^.*\R?', "") ; Updated to overcome "unique trait", 2 posts down. ConsoleWrite($Result1 & @CRLF & @CRLF) ;$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){3}(Location).*)^.*\R?', "") $Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){3}+(Location).*)^.*\R?', "") ; Updated to overcome "unique trait", 2 posts down. ConsoleWrite($Result2 & @CRLF & @CRLF) Edited March 7, 2020 by Malkey Updated RE pattern to overcome unique trait. IanN1990 1 Link to comment Share on other sites More sharing options...
IanN1990 Posted February 14, 2020 Author Share Posted February 14, 2020 @Malkey Hi, both of your examples work perfectly. I am very jealous of the wizardry others have in RegExp, it is a hard art! Link to comment Share on other sites More sharing options...
IanN1990 Posted March 6, 2020 Author Share Posted March 6, 2020 (edited) @Malkey After a month of reliable work, i have discovered an unique trait! $CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & '2,Enabled,"Mr,Test",Location,Test' & @CRLF $Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){4}(Location).*(*SKIP)(*F)|^.*\R?', "") ConsoleWrite($Result1 & @CRLF & @CRLF) $Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){4}(Location).*)^.*\R?', "") ConsoleWrite($Result2 & @CRLF & @CRLF) It is returning both lines when only one has "Location" in column four? Edited March 6, 2020 by IanN1990 Link to comment Share on other sites More sharing options...
Malkey Posted March 6, 2020 Share Posted March 6, 2020 Added a "possessive quantifier", "+". $CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & _ '2,Enabled,"Mr,Test",Location,Test' & @CRLF & _ '3,Enabled,Test,"Mr,Test",Location' & @CRLF & _ '4,"Mr,Test",Enabled,Location,Test' & @CRLF $Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]+,)){4}+Location.*(*SKIP)(*F)|^.*\R?', "") ConsoleWrite($Result1 & @CRLF & @CRLF) $Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]+,)){4}+Location.*)^.*\R?', "") ConsoleWrite($Result2 & @CRLF & @CRLF) ConsoleWrite("-------------------------------------------" & @CRLF) $Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]+,)){3}+Location.*(*SKIP)(*F)|^.*\R?', "") ConsoleWrite($Result1 & @CRLF & @CRLF) $Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]+,)){3}+Location.*)^.*\R?', "") ConsoleWrite($Result2 & @CRLF & @CRLF) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now