IAMK Posted February 25, 2019 Share Posted February 25, 2019 I read the following but still have some questions, as my circumstances may be bigger than what was discussed in the thread. How quick is AutoIT at parsing text files? E.g. If a simple powershell/python script takes ~5 hours to run, would AutoIT be slower or the same speed (on average)? I was planning on using AutoIT for the GUI, but would like to know if I should make it purely AutoIT or just use AutoIT for the GUI and make it call the powershell/python scripts I have. Examples of my powershell/python scripts: 1- Parse 1GB+ of text files and count how many times certain strings appear. 2- Read a file into an array and find all unique combinations (out of millions of possible mathematical combinations). Link to comment Share on other sites More sharing options...
FrancescoDiMuro Posted February 25, 2019 Share Posted February 25, 2019 @IAMK You should just test them. By the way, you can use AutoIt to do everything you listed. Remember to post samples if you want to "go further" Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette Link to comment Share on other sites More sharing options...
Dwalfware Posted February 25, 2019 Share Posted February 25, 2019 array = 16milon entries i think? Link to comment Share on other sites More sharing options...
BigDaddyO Posted February 25, 2019 Share Posted February 25, 2019 AutoIt isn't "quick" but then again I don't think Python or Powershell are that much faster. if you want to really crank up the speed you will need to move to something like C++ which is what I ended up doing for the massive .txt files I used to validate. Little thread here: https://www.autoitscript.com/forum/topic/183911-which-language-to-learn/ Basically what took 6 hours to run in AutoIt took only 10 minutes with C++ and I was able to copy/paste with minimal changes from my AutoIt code. Link to comment Share on other sites More sharing options...
Earthshine Posted February 25, 2019 Share Posted February 25, 2019 (edited) python is plenty fast enough if you know what you are doing. here's a good read. there are plenty of good parsers out there, but you must design efficient code. https://www.quora.com/What-are-the-fastest-ways-to-read-text-lines-in-large-files-by-Python and how about asking google? https://www.google.com/search?q=high+performance+file+parser+python&rlz=1C1GCEU_enUS825US825&oq=high&aqs=chrome.2.69i57j69i61j69i59l2j69i61l2.2064j0j4&sourceid=chrome&ie=UTF-8 Edited February 25, 2019 by Earthshine My resources are limited. You must ask the right questions Link to comment Share on other sites More sharing options...
IAMK Posted February 26, 2019 Author Share Posted February 26, 2019 @BigDaddyO Thanks. I will look into FreeBasic. Else I will probably go for C++. @Earthshine Thanks. I'm not familiar with python, but I was told to write in it, and getting it fast was a challenge. For my python script, the file read takes milliseconds, but the operations go on forever. Looking at it even now requires me to read it to try and reunderstand it. The messy code is something like: for a, aVal in enumerate(saa_data[0]): for b, bVal in enumerate(saa_data[1]): for c, cVal in enumerate(saa_data[2]): for i, iVal in enumerate(ia_lines): if (saa_data[0][getNum(0, a, i)] or ia_wild[0]) == (saa_data[1][getNum(1, b, i)] or ia_wild[0]) == (saa_data[2][getNum(2, c, i)] or ia_wild[0]): for d, dVal in enumerate(saa_data[3]): if (saa_data[0][getNum(0, a, i)] or ia_wild[0]) == (saa_data[3][getNum(3, d, i)] or ia_wild[0]): for e, eVal in enumerate(saa_data[4]): if (saa_data[0][getNum(0, a, i)] or ia_wild[0]) == (saa_data[4][getNum(4, e, i)] or ia_wild[0]): if searchOrAddRecord() == False: #If new unique combination was found. fout.write(str(a) + "," + str(b) + "," + str(c) + "," + str(d) + "," + str(e) + '\n') Link to comment Share on other sites More sharing options...
junkew Posted February 26, 2019 Share Posted February 26, 2019 LarsJ 1 FAQ 31 How to click some elements, FAQ 40 Test automation with AutoIt, Multithreading CLR .NET Powershell CMDLets Link to comment Share on other sites More sharing options...
Earthshine Posted February 26, 2019 Share Posted February 26, 2019 (edited) poorly written code is always hard to understand. I would use Pandas parser in Python. there are so many but that is a really fast one. check this article out https://www.vipinajayakumar.com/parsing-text-with-python/#parsing-text-in-standard-format and this is the way to parse in general in Python with open('yourfile.csv') as f: for line in f: print('lines'+line) you could write functions that do all the searches, and use Regular Expressions to speed it up still, it won't match C though, but that link junkew posted above could be just what you want. Edited February 26, 2019 by Earthshine LarsJ 1 My resources are limited. You must ask the right questions Link to comment Share on other sites More sharing options...
junkew Posted February 26, 2019 Share Posted February 26, 2019 Maybe you have some examples of your logs and what you are searching for. Maybe using arrays is a solution not perfect for your problem. If you search n values it could be smart in first step to split logfiles and remove garbage not interested in. FAQ 31 How to click some elements, FAQ 40 Test automation with AutoIt, Multithreading CLR .NET Powershell CMDLets Link to comment Share on other sites More sharing options...
BigDaddyO Posted February 26, 2019 Share Posted February 26, 2019 looks like you are simply looking for data within the text file, not validating the data is in the proper format like I was doing. you may be better off loading the data into SQLite (_SQLite functions) and run some queries to pull the data out. Link to comment Share on other sites More sharing options...
IAMK Posted February 27, 2019 Author Share Posted February 27, 2019 (edited) @Earthshine @junkew I cannot share the actual logs, as it's confidential data. However, for the python script, the log is a small one which takes milliseconds to read. The operations are done on my array. Anyways, I'll read into Larsj's safe array stuff. Sounds good. Thanks. @BigDaddyO Correct. No validation is necessary (I manually verified the format and data before creating the script). I'm not really pulling out data, but rather processing it. Edited February 27, 2019 by IAMK Link to comment Share on other sites More sharing options...
junkew Posted February 28, 2019 Share Posted February 28, 2019 (edited) Just the layout of your log a few lines with some non confidential data and what your search strings look like can help people in the forum to make you some example scripts on approaches Are logs realtime to be analyzed or static offloaded logs. Especially question 2 getting all combinations need some explanation. Edited February 28, 2019 by junkew FAQ 31 How to click some elements, FAQ 40 Test automation with AutoIt, Multithreading CLR .NET Powershell CMDLets Link to comment Share on other sites More sharing options...
IAMK Posted February 28, 2019 Author Share Posted February 28, 2019 @junkew Ehhh, the stripped down version would be that each loop represents a letter. The letters are read from the static file into arrays before the loops come into action on the arrays. E.g. hello, helli, helol, hallo are uniques which get logged, and if it finds another hello it doesn't get logged a 2nd time. Link to comment Share on other sites More sharing options...
LarsJ Posted February 28, 2019 Share Posted February 28, 2019 I would like to make a few comments on the example that junkew and Earthshine have referred to. The example is about handling (large) AutoIt arrays through compiled code. There are two ways to handle AutoIt arrays with compiled code. One is to use C# or VB code through .NET Framework. This is clearly the easiest way. Arrays are simply passed back and forth between AutoIt and compiled code as parameters in object methods. The other way is to use a standard dll-file implemented in C/C++ or similar languages. Arrays are passed back and forth between AutoIt and functions in the dll-file as pointers to safearrays. This way is more difficult because you are forced to deal with safarrays. Safearrays because native AutoIt arrays internally are stored as safearrays. This is described in Accessing AutoIt Variables. A few days ago there was a question (Limits to StringSplit?) that partly reminds of the problem described in first post above. The question was about how to handle a 1GB csv-file with 60 million rows. The problem is that AutoIt only handles arrays up to 16 million elements and isn't fast to handle such large amounts of data. There were two suggestions for solving the problem. One suggestion by jchd is based on using SQLite. The second proposal is based on using VB code through .NET Framework. The two suggestions resemble each other in that they are both based on adding some additional functionality to AutoIt, so that the problem can be handled. The SQLite solution requires access to the SQLite dll-file and a database must be created. This is relatively easy through SQLite.au3 UDF. Then there is complete database functionality in AutoIt. In addition, the SQL language is necessary. The VB.NET solution requires access to .NET dll-files, the VB code must be loaded from the source file and compiled and objects created. This is relatively easy through DotNetAll.au3 UDF. Then compiled and multi-threaded code can be used in AutoIt and very large arrays can be handled. In addition, the VB (VB.NET) language is necessary. For the problem described in first post I would recommend a VB.NET solution. Maybe an SQLite solution cannot be excluded. Earthshine 1 Controls, File Explorer, ROT objects, UI Automation, Windows Message MonitorCompiled code: Accessing AutoIt variables, DotNet.au3 UDF, Using C# and VB codeShell menus: The Context menu, The Favorites menu. Shell related: Control Panel, System Image ListsGraphics related: Rubik's Cube, OpenGL without external libraries, Navigating in an image, Non-rectangular selectionsListView controls: Colors and fonts, Multi-line header, Multi-line items, Checkboxes and icons, Incremental searchListView controls: Virtual ListViews, Editing cells, Data display functions Link to comment Share on other sites More sharing options...
IAMK Posted March 1, 2019 Author Share Posted March 1, 2019 @LarsJ Thanks for the comprehensive information. I also see I may have mislead people with my first post. My arrays/filesize are not extreme, it is just the amount of times the operation is done which is extreme. Powershell: Open 10+ 100mb text files one at a time and parse them. Python: Open a 10kb text file, parse it into an array with a total of 500-1500 elements. Another array stores unique combinations (1000). I should be good for now I think. Thanks. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now