jchd Posted October 4, 2019 Share Posted October 4, 2019 🙂 and it's even possible to perform an extraction of wanted lines by processing the whole file in one go. Local $hFilehandle = FileOpen(@ScriptDir & "\less.txt", $FO_OVERWRITE) FileWrite($hFilehandle, StringRegExpReplace(FileRead(@ScriptDir & "\more.txt"), "(?m)(?|(^(.)\2\2..\R?)|(^(.)\2.\2.\R?)|(^(.)\2..\2\R?)|(^(.).\2\2.\R?)|(^(.).\2.\2\R?)|(^(.)..\2\2\R?)|(^.(.)\2\2.\R?)|(^.(.)\2.\2\R?)|(^.(.).\2\2\R?)|(^..(.)\2\2\R?))", "")) FileClose($hFilehandle)  This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Alitai Posted October 5, 2019 Author Share Posted October 5, 2019 An here is the Code to remove 2 dobles. I know but i can't shorten it... expandcollapse popup#include <FileConstants.au3> #include <AutoItConstants.au3> #include <Array.au3> #include <File.au3> local $aFile _FileReadToArray(@ScriptDir &"\Zahlen kuerzer.txt", $aFile) $sFileName2 = @ScriptDir &"\Zahlen noch kuerzer.txt" $hFilehandle2 = FileOpen($sFileName2, $FO_OVERWRITE) Local $sOriginal Local $i $str = 0 While ($str <> 11715600) $str += 1 $sOriginal = $aFile[$str] While 1 $si = 0 $sOne = StringMid($sOriginal, $i+1, 1) $sTwo = StringMid($sOriginal, $i+2, 1) $sThree = StringMid($sOriginal, $i+3, 1) $sFour = StringMid($sOriginal, $i+4, 1) $sFive = StringMid($sOriginal, $i+5, 1) If $sTwo = $sFive Then If $sThree = $sFour Then ExitLoop EndIf Else ;Nothing EndIf If $sTwo = $sFour Then If $sThree = $sFive Then ExitLoop EndIf Else ;Nothing EndIf If $sTwo = $sThree Then If $sFour = $sFive Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sFive Then If $sThree = $sFour Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sFour Then If $sThree = $sFive Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sThree Then If $sFour = $sFive Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sFive Then If $sTwo = $sFour Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sFour Then If $sTwo = $sFive Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sTwo Then If $sFour = $sFive Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sFive Then If $sTwo = $sThree Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sThree Then If $sTwo = $sFive Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sTwo Then If $sThree = $sFive Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sFour Then If $sTwo = $sThree Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sThree Then If $sTwo = $sFour Then ExitLoop EndIf Else ;Nothing EndIf If $sOne = $sTwo Then If $sThree = $sFour Then ExitLoop EndIf Else FileWrite($hFilehandle2, $sOriginal & @CRLF) ExitLoop EndIf FileWrite($hFilehandle2, $sOriginal & @CRLF) ExitLoop WEnd WEnd  Link to comment Share on other sites More sharing options...
Malkey Posted October 5, 2019 Share Posted October 5, 2019 (edited) Quote from Alitai's above post Quote An here is the Code to remove 2 dobles.  ..... This script removes all entries with two pairs of identical characters from a million random five character entries. expandcollapse popup; https://www.autoitscript.com/forum/topic/200448-script-remove-all-chars-and-numbers-higher-then-2-from-a-block-of-5/?do=findComment&comment=1438027 ;#cs ; ---------------- Create test data file ----------------- Local $c = "" For $i = 1 To 1000000 $c &= Chr(Random(65, 70, 1)) & Chr(Random(65, 70, 1)) & Chr(Random(65, 70, 1)) & Chr(Random(65, 70, 1)) & Chr(Random(65, 70, 1)) & @CRLF Next FileWrite(@ScriptDir & "\more1.txt", $c) ; ---------------- End of Creating test data file ----------------- ;#ce Local $hTimer = TimerInit() ; Begin the timer and store the handle in a variable. Local $hFilehandle = FileOpen(@ScriptDir & "\less.txt", 2) ; $FO_OVERWRITE (2) ; Remove lines with 3 of the same character present. ;FileWrite($hFilehandle, StringRegExpReplace(FileRead(@ScriptDir & "\more1.txt"), ".*?(.)(?=(.*?\1){2}).*\R?", "")) ; Mikell's from next post ;#cs ; Remove lines with 2 pair of characters present. FileWrite($hFilehandle, StringRegExpReplace(FileRead(@ScriptDir & "\more1.txt"), "(?m)(?|" & _ "^.(.)(.)\2\1\R?|" & _ ; XABBA - These comments are examples of entries that will be removed. "^(.).(.)\2\1\R?|" & _ ; AXBBA "^(.)(.).\2\1\R?|" & _ ; ABXBA "^(.)(.)\2.\1\R?|" & _ ; ABBXA "^(.)(.)\2\1.\R?|" & _ ; ABBAX "^.(.)(.)\1\2\R?|" & _ ; XABAB "^(.).(.)\1\2\R?|" & _ ; AXBAB "^(.)(.).\1\2\R?|" & _ ; ABXAB "^(.)(.)\1.\2\R?|" & _ ; ABAXB "^(.)(.)\1\2.\R?|" & _ ; ABABX "^.(.)\1(.)\2\R?|" & _ ; XAABB "^(.).\1(.)\2\R?|" & _ ; AXABB "^(.)\1.(.)\2\R?|" & _ ; AAXBB "^(.)\1(.).\2\R?|" & _ ; AABXB "^(.)\1(.)\2.\R?)", _ ; AABBX "")) ;#ce FileClose($hFilehandle) ConsoleWrite(Round(TimerDiff($hTimer) / 1000, 3) & "sec" & @CRLF) ShellExecuteWait(@ScriptDir & "\less.txt") FileDelete(@ScriptDir & "\more1.txt")  Edited October 7, 2019 by Malkey Changed from relative to absolute backreferencing in RE pattern And, added comments to RE pattern. Edited Saturday at 06:41 PM by Malkey Added 5 more possibilities in StringRegExpReplace pattern. And. emphasis the aim is to remove 2 doubles. Link to comment Share on other sites More sharing options...
mikell Posted October 5, 2019 Share Posted October 5, 2019 (edited) My 2 cents try. Seems to work, but I didn't test the speed... ; remove all lines which have 3 or more of the same chars $tmp = StringRegExpReplace($txt, '.*?(.)(?=(.*?\1){2}).*\R?', "") Â Edited October 5, 2019 by mikell Link to comment Share on other sites More sharing options...
jchd Posted October 5, 2019 Share Posted October 5, 2019 I once thought of something like that but feared slowness due to multiple backtracking. In practice and contrary to intuition, this isn't so and @mikell's pattern is 15-20% faster than the explicit alternation I used. PCRE2 with JIT could possibly reverse the result but I can't test that right now. Note that there is a spurious hidden character (0xFFFE) between ? and ' at the end of @mikell's pattern in above code snippet. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted October 5, 2019 Share Posted October 5, 2019 5 hours ago, jchd said: In practice and contrary to intuition, this isn't so Hmm not sure. Jan Goyvaerts says : " The alternation operator has the lowest precedence of all regex operators. " So I tried the first expression you posted, using one backreference only and no group \1 in the alternations, and this one is faster indeed expandcollapse popup$txt = "9999V" & @crlf & _ "9999B" & @crlf & _ "9999N" & @crlf & _ "9999M" & @crlf & _ "99992" & @crlf & _ "99993" & @crlf & _ "99994" & @crlf & _ "99995" & @crlf & _ "99996" & @crlf & _ "99997" & @crlf & _ "99998" & @crlf & _ "99999" & @crlf & _ "QQW4N" & @crlf & _ "QQW4M" & @crlf & _ "QQW42" & @crlf & _ "QQW43" & @crlf & _ "QQW44" & @crlf & _ "QQW45" & @crlf & _ "QQW46" & @crlf & _ "QQW47" & @crlf & _ "QQW48" & @crlf & _ "QVRNG" & @crlf & _ "QVRNF" & @crlf & _ "QVRND" & @crlf & _ "QVRNY" & @crlf & _ "QVRNX" & @crlf & _ "QVRNC" & @crlf & _ "QVRNV" & @crlf & _ "QVRNB" & @crlf & _ "QVRNN" & @crlf & _ "QVRNM" & @crlf & _ "QVRN2" & @crlf & _ "YDRDK" & @crlf & _ "YDRDJ" & @crlf & _ "YDRDH" & @crlf & _ "YDRDG" & @crlf & _ "YDRDF" & @crlf & _ "YDRDD" & @crlf & _ "YDRDY" & @crlf & _ "YDRDX" & @crlf & _ "YDRDC" & @crlf & _ "YDRDV" & @crlf & _ "YDRDB" & @crlf & _ "YDRDN" & @crlf & _ "YDRDM" & @crlf & _ "YDRD2" & @crlf & _ "YDRD3" & @crlf & _ "YDRD4" & @crlf & _ "YDRD5" & @crlf & _ "YDRD6" & @crlf & _ "YDRD7" & @crlf & _ "YDRD8" & @crlf & _ "YDRD9" & @crlf & _ "YDRYQ" & @crlf & _ "YDRYW" & @crlf & _ "QWQQQ" & @crlf & _ "QWQQW" & @crlf & _ "QWQQR" & @crlf & _ "QWQQT" & @crlf & _ "QWQQP" & @crlf & _ "QWQQK" & @crlf & _ "QWQQJ" & @crlf & _ "QWQQH" & @crlf & _ "QWQQG" & @crlf & _ "QWQQF" & @crlf & _ "QWQQD" & @crlf & _ "QWQQY" & @crlf & _ "QWQQX" & @crlf & _ "QWQQC" & @crlf & _ "QWQQV" & @crlf & _ "QWQQB" & @crlf & _ "QWQQN" & @crlf & _ "QWQQM" & @crlf & _ "QWQQ2" & @crlf & _ "QWQQ3" & @crlf & _ "QWQQ4" & @crlf & _ "QWQQ5" & @crlf & _ "QWQQ6" & @crlf & _ "QWQQ7" & @crlf & _ "QWQQ8" & @crlf & _ "QWQQ9" & @crlf & _ "QWQWQ" & @crlf & _ "QWQWW" & @crlf & _ "QWQWR" & @crlf & _ "QWQWT" & @crlf & _ "QWQWP" & @crlf & _ "QWQWK" & @crlf & _ "QWQWJ" & @crlf & _ "QWQWH" & @crlf & _ "QWQWG" ; Msgbox(0,"", $txt) $t = timerinit() $res0 = StringRegExpReplace($txt, '.*?(.)(?=(.*?\1){2}).*\R?', "") $t0 = timerdiff($t) $t = timerinit() $res1 = StringRegExpReplace($txt, "(?m)(?|(^(.)\2\2..\R?)|(^(.)\2.\2.\R?)|(^(.)\2..\2\R?)|(^(.).\2\2.\R?)|(^(.).\2.\2\R?)|(^(.)..\2\2\R?)|(^.(.)\2\2.\R?)|(^.(.)\2.\2\R?)|(^.(.).\2\2\R?)|(^..(.)\2\2\R?))", "") $t1 = timerdiff($t) $t = timerinit() $res2 = StringRegExpReplace($txt, "(?m)(?|^(.)\1\1..\R?|^(.)\1.\1.\R?|^(.)\1..\1\R?|^(.).\1\1.\R?|^(.).\1.\1\R?|^(.)..\1\1\R?|^.(.)\1\1.\R?|^.(.)\1.\1\R?|^.(.).\1\1\R?|^..(.)\1\1\R?)", "") $t2 = timerdiff($t) Filewrite("0.txt", $res0) Filewrite("1.txt", $res1) Filewrite("2.txt", $res2) msgbox(0,"", $t0 &@crlf& $t1 &@crlf& $t2) Â Link to comment Share on other sites More sharing options...
jchd Posted October 6, 2019 Share Posted October 6, 2019 (edited) Precedence doesn't mean speed! But yes indeed I must have exchanged the timings.(*) The pattern is even faster with ^ and \R? factored out of the alternation. $res2 = StringRegExpReplace($txt, "(?m)^(?|(.)\1\1..|(.)\1.\1.|(.)\1..\1|(.).\1\1.|(.).\1.\1|(.)..\1\1|.(.)\1\1.|.(.)\1.\1|.(.).\1\1|..(.)\1\1)\R?", "") But timings depend on other factors: $t = timerinit() For $i = 1 To 10000 $res0 = StringRegExpReplace($txt, '.*?(.)(?=(.*?\1){2}).*\R?', "") Next $t0 = timerdiff($t) $t = timerinit() For $i = 1 To 10000 $res1 = StringRegExpReplace($txt, "(?m)^(?|((.)\2\2..)|((.)\2.\2.)|((.)\2..\2)|((.).\2\2.)|((.).\2.\2)|((.)..\2\2)|(.(.)\2\2.)|(.(.)\2.\2)|(.(.).\2\2)|(..(.)\2\2))\R?", "") Next $t1 = timerdiff($t) $t = timerinit() For $i = 1 To 10000 $res2 = StringRegExpReplace($txt, "(?m)^(?|(.)\1\1..|(.)\1.\1.|(.)\1..\1|(.).\1\1.|(.).\1.\1|(.)..\1\1|.(.)\1\1.|.(.)\1.\1|.(.).\1\1|..(.)\1\1)\R?", "") Next $t2 = timerdiff($t) Vary the iteration bound to see that. (*) I must have used 10000 when comparing. Edited October 6, 2019 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now