xdp22 Posted November 27, 2010 Share Posted November 27, 2010 Hello first of all sorry of my bad Englisch ( i will use translator ). I tried to make a script that compares two files, if both of them are the same line, delete it, do not know how to go about it, this is an example how it should work: We have 2 files. File1.txt a b lol c d File2.txt b a lol2 d c So lines "a,b,c,d" exist on File1.txt, and on File2.txt, so i want to make program what delete this lines, and after use of program, this files should look like that : File1.txt lol File2.txt lol1 Thank you guys I was really trying but my code don't working ^^ here is it for proove i was trying func lol1() $line1 = 0 $line2 = 0 $file1 = "test1.txt" $file2 = "test2.txt" $test1 = FileReadLine($baza1, $linia1 + 1) $test22 = FileReadLine($baza2, $linia2 + 1) _FileReadToArray($test1, $test2) if stringinstr($test1, $test2) then _FileWriteToLine($test1, "", "", $linia1 +1) lol1() EndFunc Link to comment Share on other sites More sharing options...
SadBunny Posted November 27, 2010 Share Posted November 27, 2010 (edited) If only you had linux (or cygwin) you could do this a whole lot easier But this is a Windows, Windows world... #include <Array.au3> Dim $array1[3] $array1[0]="a" $array1[1]="b" $array1[2]="c" Dim $array2[3] $array2[0]="b" $array2[1]="c" $array2[2]="d" deduplicate($array1,$array2) _ArrayDisplay($array1) _ArrayDisplay($array2) Exit Func deduplicate(ByRef $ar1, ByRef $ar2) $posInArray2 = 0 For $i = 0 To UBound($ar1)-1 ; look for $ar1[$i] in $ar2 $posInArray2 = _ArraySearch($ar2,$ar1[$i]) If $posInArray2 > -1 Then ; if found, delete line from both arrays _ArrayDelete($ar1,$i) _ArrayDelete($ar2,$posInArray2) ; ... we need to search the same element again, because the next element just became the current element :) $i -= 1 EndIf ; $ar1 could be smaller than before, so the loop might go out of bounds for $ar1. If so, quit loop, we're done. If $i >= UBound($ar1)-1 Then Return Next EndFunc EDIT: This would only work if the files contain unique lines So don't use this for something serious! Edited November 27, 2010 by SadBunny Roses are FF0000, violets are 0000FF... All my base are belong to you. Link to comment Share on other sites More sharing options...
Tvern Posted November 27, 2010 Share Posted November 27, 2010 I don't know how big these files are, or how often the function will be called, this example will work, but it is not very fast, because it opens and closes the file twice and uses _ArrayDelete. It can be made faster, but it'll get a little more complicated. #include<file.au3> #include<array.au3> _RemoveDuplicateLines("test1.txt", "test2.txt") Func _RemoveDuplicateLines($sFilePath1, $sFilePath2) Local $aFile1, $aFile2, $hFile1, $hFile2 If Not _FileReadToArray($sFilePath1, $aFile1) Then Return SetError(1,1,0) If Not _FileReadToArray($sFilePath2, $aFile2) Then Return SetError(1,2,0) For $i = $aFile1[0] To 1 Step -1 For $j = $aFile2[0] To 1 Step -1 If $aFile1[$i] = $aFile2[$j] Then _ArrayDelete($aFile1,$i) _ArrayDelete($aFile2,$j) $aFile1[0] -= 1 $aFile2[0] -= 1 If Not $aFile1[0] Then ExitLoop 2 ;no point checking further once one file is completely empty If Not $aFile2[0] Then ExitLoop 2 ExitLoop EndIf Next Next $aFile1[0] = "" $aFile2[0] = "" _FileWriteFromArray($sFilePath1,$aFile1,1) _FileWriteFromArray($sFilePath2,$aFile2,1) EndFunc Example is case-insensitive. Link to comment Share on other sites More sharing options...
xdp22 Posted November 27, 2010 Author Share Posted November 27, 2010 (edited) Thanks for both replys I will test this two versions, thanks very much @UP This files is big, i think minimum 1000 lines, it was good ? Edit: version of Tvern works, but not always ( but thanks ), now i will test the Sadbunny version. Edit: Thanks very much, can this work little faster? xD ( but it's not bad not bad, i just ask ) Edited November 27, 2010 by xdp22 Link to comment Share on other sites More sharing options...
Tvern Posted November 27, 2010 Share Posted November 27, 2010 If with not always, you mean is doesn't remove duplicate lines in $aFile2, remove the ExitLoop. Link to comment Share on other sites More sharing options...
JohnOne Posted November 27, 2010 Share Posted November 27, 2010 Sorry to be off topic, but any chance you could give me the link to that translator you use. Seems really good. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
xdp22 Posted November 27, 2010 Author Share Posted November 27, 2010 (edited) Sorry to be off topic, but any chance you could give me the link to that translator you use.Seems really good.It was just Google Translator, here you are - http://translate.google.comBut it's not good really trust me.@TvernCan u delete that? cuz this code is too advanced for me, i delete something, and this will no more work thank you ^^ Edited November 27, 2010 by xdp22 Link to comment Share on other sites More sharing options...
Tvern Posted November 27, 2010 Share Posted November 27, 2010 I realised that just removing that line would not work anyways, Try this: I've commented it, so hopefully you'll understand how it works. #include<file.au3> #include<array.au3> _RemoveDuplicateLines("test1.txt", "test2.txt") Func _RemoveDuplicateLines($sFilePath1, $sFilePath2) Local $aFile1, $aFile2, $hFile1, $hFile2, $fFound ;declare vars If Not _FileReadToArray($sFilePath1, $aFile1) Then Return SetError(1,1,0) ;only continue if the first file can be read If Not _FileReadToArray($sFilePath2, $aFile2) Then Return SetError(1,2,0) ;only continue if the second file can be read For $i = $aFile1[0] To 1 Step -1 ;loop through the first file array. (backwards is best when using _ArrayDelete) $fFound = False For $j = $aFile2[0] To 1 Step -1 ;loop through the second array If $aFile1[$i] = $aFile2[$j] Then ;if an entry from the first array matches one from the second... $fFound = True ;set the found flag, so the entry can be deleted from the first array later. _ArrayDelete($aFile2,$j) ;delete from the second array $aFile2[0] -= 1 ;reduce count by 1 If Not $aFile2[0] Then ExitLoop 2 ;exit if one file is empty EndIf Next If $fFound Then ;if a match was found.. _ArrayDelete($aFile1,$i) ;delete from first array $aFile1[0] -= 1 ;reduce count If Not $aFile1[0] Then ExitLoop ;exit if one file is empty EndIf Next $aFile1[0] = "" $aFile2[0] = "" _FileWriteFromArray($sFilePath1,$aFile1,1) _FileWriteFromArray($sFilePath2,$aFile2,1) EndFunc Link to comment Share on other sites More sharing options...
JohnOne Posted November 27, 2010 Share Posted November 27, 2010 It was just Google Translator, here you are - http://translate.google.comBut it's not good really trust me.@TvernCan u delete that? cuz this code is too advanced for me, i delete something, and this will no more work thank you ^^Seriously, it must be good, if your posts have been run through it.Are you using that traslator or no? AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
jchd Posted November 27, 2010 Share Posted November 27, 2010 I believe there is a saddle curve in file sizes S1, S2 where the naïve (arrays, loop, complexity in S1*S2 comparisons plus unknown _ArrayDelete calls) approach is slower than a dictionary or even an SQLite implementation. Using either a hash table or a bTree will make the complexity in some vincinity of S1*log(S2) + S2*log(S1). Also try to avoid repeated calls to _ArrayDelete in the case where common strings are likely since it's a rather lengthy function. A better way is either to make the array 2D and use a "found elsewhere" marker or (probably faster) empty the found strings in place and avoid copying them on output. Also use == if possible instead of = to avoid lengthy underlying code invokation. Sorry I don't have time to make examples of any of those alternative implementations. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
MvGulik Posted November 28, 2010 Share Posted November 28, 2010 (edited) whatever Edited February 7, 2011 by MvGulik "Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions.""The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014) "Believing what you know ain't so" ... Knock Knock ... Link to comment Share on other sites More sharing options...
jchd Posted November 28, 2010 Share Posted November 28, 2010 By mere curiosity, I ran the following test: input file F1 = 7815 lines, average 631 chars, total size 4934777 chars input file F2 = same as F1 but with only 11 lines modified (so 7804 lines found in F1) input file F3 = same as F1 but with only 3 line unchanged (leaving 7812 unique lines) Run time from Scite: 16.891 on a stock PC, which I feel isn't as bad as it can seem given the task at hand. There is still ample room for optimizations. This code will hapily process any number of input files with complexity T * log(T) with T = total # of input lines in all files. expandcollapse popup#include <SQLite.au3> #include <SQLite.Dll.au3> #include <Array.au3> Main() ; removes every occurence of same (without respect to lower ASCII case, see below) text line found elsewhere in a group of text files Func Main() ; init SQLite _SQLite_Startup() ; create a :memory: DB Local $hDB = _SQLite_Open() ; create a single table, with an index on text and a trigger to delete strings "found elsewhere" right after insert ; doing so will minimize the number of comparisons, and those compares are fast low-level code ; ; WARNING: this will work as intended for lower ASCII without respect to case ; Unicode compares *-with-* respect to case can be done efficiently by using COLLATE BINARY instead of NOCASE ; universal Unicode compares without respect to case need a bit more complex setup (but can still be called efficient) _SQLite_Exec($hDB, "CREATE TABLE Strings (LineNum INTEGER, Source INTEGER, String CHAR COLLATE NOCASE, " & _ "PRIMARY KEY (LineNum, Source));" & _ "CREATE INDEX ixString ON Strings (String COLLATE NOCASE);" & _ "CREATE TRIGGER trInsString AFTER INSERT ON Strings FOR EACH ROW " & _ "WHEN exists (select 1 from Strings where String = new.String and Source != new.Source) " & _ "BEGIN " & _ "delete from Strings where String = new.String;" & _ "END;") ; get the list of input files (may process any number of files in the same run) Local $files = _FileListToArray(@ScriptDir & "\", '*.inputtxt', 1) If @error Then Return ; process input files Local $txtstr For $i = 1 to $files[0] _FileReadToArray($files[$i], $txtstr) ; process input lines If Not @error Then For $j = 1 To $txtstr[0] _SQLite_Exec($hDB, "insert into Strings (Linenum, Source, String) values (" & $j & "," & $i & "," & _SQLite_Escape($txtstr[$j]) & ");") Next EndIf Next ; store remaining data in output files Local $nrows, $ncols For $i = 1 to $files[0] ; select relevant strings left _SQLite_GetTable($hDB, "select String from Strings where Source = " & $i & ";", $txtstr, $nrows, $ncols) ; write to input filename + extra extension .uniq _FileWriteFromArray($files[$i] & '.uniq', $txtstr, 2) Next EndFunc This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jchd Posted November 28, 2010 Share Posted November 28, 2010 Apologies to update myself.I ran a simpler version which is as close as fastest than I believe an SQLite implementation can be.Results from a run over a set of 15 XML files totalling 2 696 614 lines and 81518277 bytes, details below.D:\somepath>wc *.xml Lines Words Bytes 500 597 15195 F2008-08rpleb.xml 4077 4870 124056 F2008-09pckbx.xml 114008 135374 3441076 F2008-10dykxv.xml 161906 192098 4905017 F2008-11oowcq.xml 180883 214616 5481306 F2008-12wrnvy.xml 140594 166830 4261162 F2009-01wtiyu.xml 152628 181129 4603496 F2009-02baozh.xml 198300 234930 5993839 F2009-03mfbbb.xml 199248 235940 6011069 F2009-04vykln.xml 215067 255011 6497965 F2009-05fphdx.xml 204878 242848 6198099 F2009-06ndtvq.xml 93342 110597 2828989 F2009-07hqnms.xml 180302 213717 5445789 F2009-08pgriq.xml 221067 262122 6674143 F2009-09phjch.xml 205817 244028 6222960 F2009-10jqmsg.xml 227406 269189 6872148 F2009-11sclnn.xml 196591 232751 5941968 F2009-12lgzkm.xml 2696614 3196647 81518277 total 6 011 069 F2009-04vykln.xml 5 941 968 F2009-12lgzkm.xml 15 195 F2008-08rpleb.xml 6 222 960 F2009-10jqmsg.xml 124 056 F2008-09pckbx.xml 6 674 143 F2009-09phjch.xml 3 441 076 F2008-10dykxv.xml 5 445 789 F2009-08pgriq.xml 4 905 017 F2008-11oowcq.xml 2 828 989 F2009-07hqnms.xml 5 481 306 F2008-12wrnvy.xml 6 198 099 F2009-06ndtvq.xml 4 261 162 F2009-01wtiyu.xml 6 497 965 F2009-05fphdx.xml 4 603 496 F2009-02baozh.xml 5 993 839 F2009-03mfbbb.xml 6 872 148 F2009-11sclnn.xml 157 815 F2009-03mfbbb.xml.uniq 178 632 F2009-12lgzkm.xml.uniq 158 642 F2009-04vykln.xml.uniq 124 052 F2009-01wtiyu.xml.uniq 169 914 F2009-05fphdx.xml.uniq 166 560 F2008-12wrnvy.xml.uniq 167 236 F2009-06ndtvq.xml.uniq 150 090 F2008-11oowcq.xml.uniq 75 764 F2009-07hqnms.xml.uniq 108 093 F2008-10dykxv.xml.uniq 151 001 F2009-08pgriq.xml.uniq 5 886 F2008-09pckbx.xml.uniq 181 401 F2009-09phjch.xml.uniq 846 F2008-08rpleb.xml.uniq 176 872 F2009-10jqmsg.xml.uniq 119 988 F2009-02baozh.xml.uniq 188 160 F2009-11sclnn.xml.uniqIn this set, there are a very large number or dupplicate lines both within the same file and among files.This version makes no atempt to delete dupplicate lines during insert, but instead extracts, at the output stage, those lines which have no copy elsewhere.That turned out to make insertion about twice as fast (compared to the previous version using an insertion trigger) but only slowed down output a little (thanks to good indexing choice). This also demonstrates that you can have duplicate primary keys and ignore the row being inserted if it already exists in the DB.>Exit code: 0 Time: 1241.077I challenge making it significantly faster by using only vanilla AutoIt-provided resources/UDFs, specially when run over a large set of large files as the one examplified above.I cheated in making the index use a (faster) binary compare (not a case-insensitive one), but Note that the code is rather simple, straitforward and naturally copes with as many files as needed.expandcollapse popup#include <SQLite.au3> #include <SQLite.Dll.au3> #include <Array.au3> Main() ; removes every occurence of exact same (with respect to case, see below) text line found elsewhere in a group of text files Func Main() ; init SQLite _SQLite_Startup() ; create a :memory: DB Local $hDB = _SQLite_Open() ; create a single table, with an index on text ; doing so will minimize the number of comparisons, and those compares are fast low-level code ; ; WARNING: this will work as intended, for ASCII or Unicode, with respect to case ; lower ASCII compares *-without-* respect to case can still be done efficiently by using COLLATE NOCASE ; universal Unicode compares without respect to case need a bit more complex setup (but can still be called efficient) _SQLite_Exec($hDB, "CREATE TABLE Strings (String CHAR, Source INTEGER, PRIMARY KEY (String, Source) ON CONFLICT IGNORE);") ; get the list of input files (may process any number of files in the same run) Local $dir = "your input path" Local $files = _FileListToArray(@ScriptDir & "\", '*.inputtxt', 1) If @error Then Return ; process input files Local $txtstr For $i = 1 to $files[0] ConsoleWrite("Processing file " & $dir & $files[$i] & @LF) _FileReadToArray($dir & $files[$i], $txtstr) ; process input lines _SQLite_Exec($hDB, "begin;") If Not @error Then For $j = 1 To $txtstr[0] _SQLite_Exec($hDB, "insert into Strings (Source, String) values (" & $i & "," & _SQLite_Escape($txtstr[$j]) & ");") Next EndIf _SQLite_Exec($hDB, "commit;") Next ; store remaining data in output files Local $nrows, $ncols ConsoleWrite("Creating output files" & @LF) For $i = 1 to $files[0] ; select relevant strings left _SQLite_GetTable($hDB, "select String from Strings X where " & _ "Source = " & $i & " and " & _ "not exists (select 1 from Strings Y where Y.String = X.String and Y.Source != X.Source);", _ $txtstr, $nrows, $ncols) ; write to input filename + extra extension .uniq _FileWriteFromArray($dir & $files[$i] & '.uniq', $txtstr, 2) Next EndFunc ZombieKillz 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
MvGulik Posted November 28, 2010 Share Posted November 28, 2010 (edited) whatever Edited February 7, 2011 by MvGulik "Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions.""The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014) "Believing what you know ain't so" ... Knock Knock ... Link to comment Share on other sites More sharing options...
jchd Posted November 29, 2010 Share Posted November 29, 2010 It's an example of how simple and powerful using such a low-level but efficient DB like SQLite can be in an otherwise non-DB application. Add error checking as required, the code was made in a rush. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
ade Posted April 2, 2011 Share Posted April 2, 2011 It's an example of how simple and powerful using such a low-level but efficient DB like SQLite can be in an otherwise non-DB application.Add error checking as required, the code was made in a rush.I know this post is old and wasn't sure whether to start a new thread and reference this post or to just reply at the bottom of this one. Apologies in advance if this offends anyone!The code by jhcd is really nice and I have tried using it but adapting it slightly but to no avail. I am going around in circles and so have posted here in the hope that someone more knowledgable than me will help out.What I would like to do is to remove the duplicates but not ALL of the duplicated instances, leaving 1 of the duplicates intact. Is it possible to make the SQLite query so that it does that, and if so what would it be?Thanks! Link to comment Share on other sites More sharing options...
Tvern Posted April 2, 2011 Share Posted April 2, 2011 I know this post is old and wasn't sure whether to start a new thread and reference this post or to just reply at the bottom of this one. Apologies in advance if this offends anyone!The code by jhcd is really nice and I have tried using it but adapting it slightly but to no avail. I am going around in circles and so have posted here in the hope that someone more knowledgable than me will help out.What I would like to do is to remove the duplicates but not ALL of the duplicated instances, leaving 1 of the duplicates intact. Is it possible to make the SQLite query so that it does that, and if so what would it be?Thanks!It's usually better to start a new thread and link to threads that might be relevant.What you ask for sounds like the way the examples already work. They ensure that each entry is unique, and that the result containst all unique entries.If you mean you want to allow 1, or more duplicate, then I think my example would be easier to adjust and I think the SQLite example would become a great deal slower if you found a way to make it work. (but I am not that familliar with SQLite and there might be an effective way to do it yet.)If you want to adjust my example. you should look at changing $fFound from a boolean, to an int, increasing it's value for each match found and then deleting values once the number reaches the upper limit you want to allow.I'm going to bed now, but I'll see if I can have a look tomorrow. (I was going to look into a more effecient _arraydelete anyways, so this would make for a good reason to do that.) Link to comment Share on other sites More sharing options...
jchd Posted April 3, 2011 Share Posted April 3, 2011 @ade Next time start a new thread so you have more chance to attract eyes. Anyway, I would find it easier if you could restate your own distinct problem in your own words, preferably with a short example of the sample inputs (few lines) and intended result covering all your practical cases. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now