Jump to content

AutoIT and PowerShell/Python


IAMK
 Share

Recommended Posts

I read the following but still have some questions, as my circumstances may be bigger than what was discussed in the thread.

How quick is AutoIT at parsing text files? E.g. If a simple powershell/python script takes ~5 hours to run, would AutoIT be slower or the same speed (on average)?

I was planning on using AutoIT for the GUI, but would like to know if I should make it purely AutoIT or just use AutoIT for the GUI and make it call the powershell/python scripts I have.

Examples of my powershell/python scripts:
1- Parse 1GB+ of text files and count how many times certain strings appear.
2- Read a file into an array and find all unique combinations (out of millions of possible mathematical combinations).

 

Link to comment
Share on other sites

AutoIt isn't "quick" but then again I don't think Python or Powershell are that much faster.  if you want to really crank up the speed you will need to move to something like C++ which is what I ended up doing for the massive .txt files I used to validate.  Little thread here:  https://www.autoitscript.com/forum/topic/183911-which-language-to-learn/

Basically what took 6 hours to run in AutoIt took only 10 minutes with C++ and I was able to copy/paste with minimal changes from my AutoIt code.

Link to comment
Share on other sites

python is plenty fast enough if you know what you are doing. here's a good read. there are plenty of good parsers out there, but you must design efficient code.

https://www.quora.com/What-are-the-fastest-ways-to-read-text-lines-in-large-files-by-Python

and how about asking google?

https://www.google.com/search?q=high+performance+file+parser+python&rlz=1C1GCEU_enUS825US825&oq=high&aqs=chrome.2.69i57j69i61j69i59l2j69i61l2.2064j0j4&sourceid=chrome&ie=UTF-8

 

Edited by Earthshine

My resources are limited. You must ask the right questions

 

Link to comment
Share on other sites

@BigDaddyO Thanks. I will look into FreeBasic. Else I will probably go for C++.

@Earthshine Thanks. I'm not familiar with python, but I was told to write in it, and getting it fast was a challenge. For my python script, the file read takes milliseconds, but the operations go on forever. Looking at it even now requires me to read it to try and reunderstand it. The messy code is something like:

for a, aVal in enumerate(saa_data[0]):
        for b, bVal in enumerate(saa_data[1]):
            for c, cVal in enumerate(saa_data[2]):
                for i, iVal in enumerate(ia_lines):
                    if (saa_data[0][getNum(0, a, i)] or ia_wild[0]) == (saa_data[1][getNum(1, b, i)] or ia_wild[0]) == (saa_data[2][getNum(2, c, i)] or ia_wild[0]):
                        for d, dVal in enumerate(saa_data[3]):
                            if (saa_data[0][getNum(0, a, i)] or ia_wild[0]) == (saa_data[3][getNum(3, d, i)] or ia_wild[0]):
                                for e, eVal in enumerate(saa_data[4]):
                                    if (saa_data[0][getNum(0, a, i)] or ia_wild[0]) == (saa_data[4][getNum(4, e, i)] or ia_wild[0]):
                                        if searchOrAddRecord() == False: #If new unique combination was found.
                                            fout.write(str(a) + "," + str(b) + "," + str(c) + "," + str(d) + "," + str(e) + '\n')

 

Link to comment
Share on other sites

poorly written code is always hard to understand. I would use Pandas parser in Python. there are so many but that is a really fast one.

check this article out

https://www.vipinajayakumar.com/parsing-text-with-python/#parsing-text-in-standard-format

 

and this is the way to parse in general in Python

with open('yourfile.csv') as f:
        for line in f:
                   print('lines'+line)

you could write functions that do all the searches, and use Regular Expressions to speed it up

 

still, it won't match C though, but that link junkew posted above could be just what you want.

Edited by Earthshine

My resources are limited. You must ask the right questions

 

Link to comment
Share on other sites

Maybe you have some examples of your logs and what you are searching for. Maybe using arrays is a solution not perfect for your problem.

If you search n values it could be smart in first step to split logfiles and remove garbage not interested in.

 

Link to comment
Share on other sites

@Earthshine @junkew I cannot share the actual logs, as it's confidential data.
However, for the python script, the log is a small one which takes milliseconds to read. The operations are done on my array.

Anyways, I'll read into Larsj's safe array stuff. Sounds good. Thanks.

@BigDaddyO Correct. No validation is necessary (I manually verified the format and data before creating the script). I'm not really pulling out data, but rather processing it.

Edited by IAMK
Link to comment
Share on other sites

Just the layout of your log a few lines with some non confidential data and what your search strings look like can help people in the forum to make you some example scripts on approaches

Are logs realtime to be analyzed or static offloaded logs.

Especially question 2 getting all combinations need some explanation.

Edited by junkew
Link to comment
Share on other sites

@junkew Ehhh, the stripped down version would be that each loop represents a letter. The letters are read from the static file into arrays before the loops come into action on the arrays.

E.g. hello, helli, helol, hallo are uniques which get logged, and if it finds another hello it doesn't get logged a 2nd time.

Link to comment
Share on other sites

I would like to make a few comments on the example that junkew and Earthshine have referred to. The example is about handling (large) AutoIt arrays through compiled code.

There are two ways to handle AutoIt arrays with compiled code. One is to use C# or VB code through .NET Framework. This is clearly the easiest way. Arrays are simply passed back and forth between AutoIt and compiled code as parameters in object methods.

The other way is to use a standard dll-file implemented in C/C++ or similar languages. Arrays are passed back and forth between AutoIt and functions in the dll-file as pointers to safearrays. This way is more difficult because you are forced to deal with safarrays. Safearrays because native AutoIt arrays internally are stored as safearrays. This is described in Accessing AutoIt Variables.


A few days ago there was a question (Limits to StringSplit?) that partly reminds of the problem described in first post above. The question was about how to handle a 1GB csv-file with 60 million rows. The problem is that AutoIt only handles arrays up to 16 million elements and isn't fast to handle such large amounts of data.

There were two suggestions for solving the problem. One suggestion by jchd is based on using SQLite. The second proposal is based on using VB code through .NET Framework.

The two suggestions resemble each other in that they are both based on adding some additional functionality to AutoIt, so that the problem can be handled.

The SQLite solution requires access to the SQLite dll-file and a database must be created. This is relatively easy through SQLite.au3 UDF. Then there is complete database functionality in AutoIt. In addition, the SQL language is necessary.

The VB.NET solution requires access to .NET dll-files, the VB code must be loaded from the source file and compiled and objects created. This is relatively easy through DotNetAll.au3 UDF. Then compiled and multi-threaded code can be used in AutoIt and very large arrays can be handled. In addition, the VB (VB.NET) language is necessary.


For the problem described in first post I would recommend a VB.NET solution. Maybe an SQLite solution cannot be excluded.

Link to comment
Share on other sites

@LarsJ Thanks for the comprehensive information.

I also see I may have mislead people with my first post. My arrays/filesize are not extreme, it is just the amount of times the operation is done which is extreme.
Powershell: Open 10+ 100mb text files one at a time and parse them.
Python: Open a 10kb text file, parse it into an array with a total of 500-1500 elements. Another array stores unique combinations (1000).

I should be good for now I think. Thanks.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...