Jump to content

Recommended Posts

Posted (edited)

Hi guys :)

Im in need of a script that will read all files in folder and look for a string in those files, files will have multiple lines. 

At the end script will display how many times string was found in all files.

There will be around 300 files and expected string occurrences in those files is around 300 / 400k

I did write some code for this and it works on test env with a couple of files that have couple of lines

what I don't know is how efficient this will be in case mentioned earlier.

So my request here is for you guys to look at my code and let me know will this be ok

or there are some other ways to do this more efficiently.

Here's the code

#include <File.au3>
#include <Array.au3>

Global $bArray, $found = 0, $stringToLookFor = "PB11"

look4stringInManyFiles()

Func look4stringInManyFiles()

    $where2look4files = "C:\temp\test\"
    
    $aArray = _FileListToArray($where2look4files, "*.txt", 0, True)

    For $i = 1 To UBound($aArray) - 1
        $path2file = $aArray[$i]
        _FileReadToArray($path2file, $bArray)
        For $a = 0 To UBound($bArray) - 1
            If StringInStr($bArray[$a], $stringToLookFor) Then $found = $found + 1
        Next
    Next

    MsgBox(0, "", $found&" instances of the string were found in all files")

EndFunc   ;==>look4stringInManyFiles

Thanks! :)

 

Edited by sakej
typo
Posted

@BigDaddyO thanks for suggestion.

I ran both version of code a couple times on my small scale test and your version actually always took longer to complete.

Arrays did it in around 1.7 where opening and closing files needed around 2.4

I don't want to be smartass as I'm here asking for help but this makes me think that my initial approach will be better in this scenario. Anyone?

Posted (edited)

Why don't you just use one of the numerous GREP for Windows command line programs?  Most have a switch that will provide just a count of matches.  That would be much faster than any grep-like logic that you could create in AutoIt.

Edited by TheXman
Posted

Hi sakej
It seems that a RegExp approach gives a faster result.
But one has to be very careful when launching several tests one after the other as the cache memory can still be filled with the precedent test result.

 

#include <File.au3>

Global $bArray, $found = 0, $stringToLookFor = "PB11"
look4stringInManyFiles()

Func look4stringInManyFiles()
    $where2look4files = "C:\temp\test\"
    $aArray = _FileListToArray($where2look4files, "*.txt", 1, True)

    For $i = 1 To UBound($aArray) - 1
        $path2file = $aArray[$i]

;~      _FileReadToArray($path2file, $bArray)
;~      For $a = 0 To UBound($bArray) - 1
;~          If StringInStr($bArray[$a], $stringToLookFor) Then $found = $found + 1
;~      Next

        $sFileContent = FileRead($path2file)
        $cArray = StringRegExp($sFileContent, '(?i)' & $stringToLookFor, $STR_REGEXPARRAYGLOBALMATCH)
        If @error = 0 Then $found = $found + Ubound($cArray)

    Next

    MsgBox(0, "", $found & " instances of the string were found in all files")
EndFunc   ;==>look4stringInManyFiles

 

Some remarks :
=> (?i) in RegExp makes the results case-insensitive (to match with your StringInStr() parameters)

=> changed one parameter from 0 to 1 in _FileListToArray() to return files only, not files + folders

=> in case the RegExp way brings a few more results : the explanation should be that "PB11" has been found more than once in a line, when StringInStr() ignored a 2nd occurence of "PB11" in the same line.

One should probably add timers in the script and launch it (with, then without the RegExp way) at different times of the day, but certainly not testing both ways one after the other.

Good luck :)

 

"I think you are searching a bug where there is no bug..."

Posted (edited)

Try this version, I think it should be fast

...

$stringToLookFor = StringUpper($stringToLookFor)

For $i = 1 to UBound($aArray) - 1
    $hFile = FileOpen($aArray[$i], 0) ; mode=read
    $sContent = StringUpper(FileRead($hFile))
    FileClose($hFile)
    StringReplace($sContent, $stringToLookFor, "", 0, 1) ; casesense=1
    $found += @extended
Next

 

Edited by Zedna
Posted
15 hours ago, sakej said:

@BigDaddyO thanks for suggestion.

I ran both version of code a couple times on my small scale test and your version actually always took longer to complete.

Arrays did it in around 1.7 where opening and closing files needed around 2.4

I don't want to be smartass as I'm here asking for help but this makes me think that my initial approach will be better in this scenario. Anyone?

As the help file says, you will only see the improvement with doing a FileOpen on larger files.  perhaps you should try testing on a few real files if can.  I assumed they were large as you expect 300 - 400k finds per file.

Posted

Thank you all for your input, I'm really grateful for that. I'll try all ways and pickup the fastest (i don't need to carry much about resources).

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...