covaks Posted January 14, 2008 Share Posted January 14, 2008 (edited) Is there a quicker way to do this? I want to read one file (1.8MB in size, 253k words, one per line) and remove all the words that are found in another file (which has about 12,000 words). #include <File.au3> #include <Array.au3> Dim $In[_FileCountLines("C:\wordlist.txt") + 1] Dim $Bad[_FileCountLines("C:\Badwords.txt") + 1] Dim $Out[_FileCountLines("C:\wordlist.txt") + 1] _FileReadToArray("C:\Badwords.txt",$Bad) _FileReadToArray("C:\wordlist.txt",$In) For $x = 1 to $Bad[0] For $y = 1 to $In[0] If $Bad[$x] = $In[$y] Then Else $Out[$y] = $In[$y] EndIf Next Next _FileWriteFromArray("C:\clean.txt",$Out) Edited January 14, 2008 by covaks Link to comment Share on other sites More sharing options...
Siao Posted January 14, 2008 Share Posted January 14, 2008 (edited) I wouldn't use array for output, unless you need that data in array for something else than just writing to file.Concatenate into single string and FileWrite it in one go after the loop. Should be more efficient.Also, StringRegExpReplace (or, if your data is strictly 1 word per line as you say, a StringInStr+StringReplace) instead of the inner loop could also be much faster than iterating through the whole of the larger array for each bad word. This means ditching the $In array too. Edited January 14, 2008 by Siao "be smart, drink your wine" Link to comment Share on other sites More sharing options...
Siao Posted January 14, 2008 Share Posted January 14, 2008 (edited) Which would be something like this: Global $aBad, $sIn $sIn = FileRead("wordlist.txt") _FileReadToArray("badwords.txt", $aBad) For $x = 1 to $aBad[0] $sIn = StringRegExpReplace($sIn, '(\A|\n)' & $aBad[$x] & '(\r|\z)', '') Next FileWrite("clean.txt", $sIn) or Global $aBad, $sIn $sIn = StringStripCR(FileRead("wordlist.txt")) & @LF _FileReadToArray("badwords.txt", $aBad) For $x = 1 to $aBad[0] $sIn = StringReplace($sIn, $aBad[$x] & @LF, '') Next If StringRight($sIn, 1) = @LF Then $sIn = StringTrimRight($sIn, 1) $sIn = StringReplace($sIn, @LF, @CRLF) FileWrite("clean.txt", $sIn) Edited January 14, 2008 by Siao "be smart, drink your wine" Link to comment Share on other sites More sharing options...
covaks Posted January 14, 2008 Author Share Posted January 14, 2008 Thank you very much. :-) Link to comment Share on other sites More sharing options...
glasglow Posted February 10, 2009 Share Posted February 10, 2009 StringReplace($strings,@CRLF&@CRLF,"") faldo 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now