way1000 Posted October 30, 2017 Share Posted October 30, 2017 i have a text file with 1 million lines where i have to remove 800k lines. i want to remove the 800k lines and keep the 200k lines but the line order has to remain the same. lines removed should become empty lines so there should still be 1 million lines but only 200k with text Link to comment Share on other sites More sharing options...
spudw2k Posted October 30, 2017 Share Posted October 30, 2017 How are the lines to keep and the lines to clear distinguished? Spoiler Things I've Made: Always On Top Tool ◊ AU History ◊ Deck of Cards ◊ HideIt ◊ ICU ◊ Icon Freezer ◊ Ipod Ejector ◊ Junos Configuration Explorer ◊ Link Downloader ◊ MD5 Folder Enumerator ◊ PassGen ◊ Ping Tool ◊ Quick NIC ◊ Read OCR ◊ RemoteIT ◊ SchTasksGui ◊ SpyCam ◊ System Scan Report Tool ◊ System UpTime ◊ Transparency Machine ◊ VMWare ESX Builder Misc Code Snippets: ADODB Example ◊ CheckHover ◊ Detect SafeMode ◊ DynEnumArray ◊ GetNetStatData ◊ HashArray ◊ IsBetweenDates ◊ Local Admins ◊ Make Choice ◊ Recursive File List ◊ Remove Sizebox Style ◊ Retrieve PNPDeviceID ◊ Retrieve SysListView32 Contents ◊ Set IE Homepage ◊ Tickle Expired Password ◊ Transpose Array Projects: Drive Space Usage GUI ◊ LEDkIT ◊ Plasma_kIt ◊ Scan Engine Builder ◊ SpeeDBurner ◊ SubnetCalc Cool Stuff: AutoItObject UDF ◊ Extract Icon From Proc ◊ GuiCtrlFontRotate ◊ Hex Edit Funcs ◊ Run binary ◊ Service_UDF Link to comment Share on other sites More sharing options...
gruntydatsun Posted October 30, 2017 Share Posted October 30, 2017 you're going to go way over the size limit for variables in autoit with that job unless you shuffle things around as you go. Might be less hassle to just do it in a text editor like npp or sublime with find/replace and a regex. If this doesn't need to be automated that is. Link to comment Share on other sites More sharing options...
mikell Posted October 30, 2017 Share Posted October 30, 2017 49 minutes ago, gruntydatsun said: you're going to go way over the size limit for variables in autoit It depends on the size of the lines ;#cs $txt = "" For $i = 0 to 1500000 $txt &= "this is the line of text #" & $i & @crlf Next FileWrite("1.txt", $txt) ;#ce $txt = FileRead("1.txt") ; remove text from lines ending with 12, 14, 16 $new = StringRegExpReplace($txt, '(?m)^.*1[246]$', "") FileWrite("new.txt", $new) Link to comment Share on other sites More sharing options...
way1000 Posted October 30, 2017 Author Share Posted October 30, 2017 here's an example of the text file before and after option: remove lines containing "items4" there's 200k lines to be removed in 1 million lines.the lines removed have to stay empty (like in the screenshot), it's a must Link to comment Share on other sites More sharing options...
mikell Posted October 30, 2017 Share Posted October 30, 2017 Did you try my previous code ? a little adaptation gives this (based precisely on the provided requirements) : $txt = FileRead("1.txt") $out = "items4" $new = StringRegExpReplace($txt, '(?m)^.*\Q' & $out & '\E.*$', "") FileWrite("new.txt", $new) AnonymousX and kylomas 2 Link to comment Share on other sites More sharing options...
AnonymousX Posted January 22, 2019 Share Posted January 22, 2019 @mikell Thanks! Is there a way to do the opposite and only keep say "items4" Link to comment Share on other sites More sharing options...
mikell Posted January 23, 2019 Share Posted January 23, 2019 (edited) If you don't want to grab the matches into an array using the usual StringRegExp, to get the result as a string you have to introduce in the StringRegExpReplace a kind of negation to say : If the lines do NOT contain "items4" then fire them Here is a way : $txt = "line1,items1,testtext1" & @crlf & _ "line2,items4,testtext2" & @crlf & _ "line3,items3,testtext3" & @crlf & _ "line4,items4,testtext4" & @crlf & _ "line5,items5,testtext5" & @crlf & _ "line6,items6,testtext6" & @crlf $in = "items4" $res = StringRegExpReplace($txt, '(?m)^(.*\Q' & $in & '\E.*(*SKIP)(*F)|.*)$\R?', "") Msgbox(0,"", $res) In this alternation, the left side first matches the lines containing "items4", then (*SKIP)(*F) says 'No no, I don't want this", then all other lines (not containing "items4") are matched by the right side of the alternation and replaced by "" Edit This example doesn't replace fired lines with blank lines. To get blank lines just remove \R? (which means optional newline sequence) Edited January 23, 2019 by mikell AnonymousX 1 Link to comment Share on other sites More sharing options...
pixelsearch Posted January 23, 2019 Share Posted January 23, 2019 Hi to both of you Mikell, to blank all lines except the "items4", I just tried a "negative lookahead" (my 1st one !). Do you think it's correct ? Based on your example : $txt = "line1,items1,testtext1" & @crlf & _ "line2,items4,testtext2" & @crlf & _ "line3,items3,testtext3" & @crlf & _ "line4,items4,testtext4" & @crlf & _ "line5,items5,testtext5" & @crlf & _ "line6,items6,testtext6" & @crlf $res = StringRegExpReplace($txt, '(?m)^(?!.*items4).*$', "...") Msgbox(0, "Dots dots", $res) The 3 dots "..." are here just to make blank lines clearly visible in the image. Replace with "" when desired Link to comment Share on other sites More sharing options...
Nine Posted January 23, 2019 Share Posted January 23, 2019 Interesting @pixelsearch. I tried the following (I was thinking it was more intuitive) and it doesn't work. Can you explain ? $res = StringRegExpReplace($txt, '(?mi)^.*(?!items4).*$', "...") “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
pixelsearch Posted January 23, 2019 Share Posted January 23, 2019 (edited) Hi Nine Let's hope Mikell, Jchd or another RegExp guru will bring you the explanation you desire I was lucky enough to have this "negative lookahead" working, after I read that "you can use any regular expression inside the lookahead (note that this is not the case with lookbehind)" A complementary question could be, in the preceding example : Why a negative lookahead (?!.* doesn't return the same results as .*(?! when a positive lookahead (?=.* returns the same results as .*(?= Edited January 23, 2019 by pixelsearch Link to comment Share on other sites More sharing options...
iamtheky Posted January 23, 2019 Share Posted January 23, 2019 fun for adding mulitple criteria as well ( i think, this could be all wrong). $txt = "line1,items1,testtext1" & @crlf & _ "line2,items4,testtext2" & @crlf & _ "line3,items3,testtext3" & @crlf & _ "line4,items4,testtext4" & @crlf & _ "line5,items5,testtext5" & @crlf & _ "line6,items6,testtext6" & @crlf ;~ $in = "(items4)" $in = "(items4|items5)" $s = StringRegExpReplace($txt , "(line.*" & $in & ".*?)\s" , " ") msgbox(0, '' , $s) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
mikell Posted January 23, 2019 Share Posted January 23, 2019 7 hours ago, pixelsearch said: Do you think it's correct ? Yes it is. Nice catch And if you know the reason why it works and how it operates, then you can easily answer to Nine Link to comment Share on other sites More sharing options...
AnonymousX Posted January 24, 2019 Share Posted January 24, 2019 (edited) @mikell Thank you!!! Exactly what I was trying to do. Appreciate the explanation too. Edit: you maybe interested to know with your help I was able to create a script that runs through 365 files, each with over 200,000 lines of data, and pinpoint all the key information into a separate file (about 300,000 lines long). With a program execution time of about 5mins. Man I love AutoIT and it's community! Thanks everyone else as well Edited January 25, 2019 by AnonymousX FrancescoDiMuro 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now