Jump to content

Recommended Posts

Posted (edited)

(Edited from original.  Please note that I AM NOT AN AUTOIT EXPERT.  I write code using Autoit frequently but I am no expert, especially when it comes to I/O.  So any remarks that start with "Why did you..." can be answered by referring to the first sentence.  This project was done in Autoit because of an interface I built to display the data.)

Attached is a program and ascii input file I wrote to read stock price data, convert it to binary and then read it back into the program in binary.  The goal was to show increased performance for reading the files in binary and provide a demo on how to read/write binary for int32, int64, double and strings for anyone who might find it helpful.  The results on my PC show the following:

Time to read ascii file only: 456.981951167202
Ascii read & process time: 6061.83075631701
Binary write file time: 14787.9184635239
Time just to read binary file: 42.418867292311
Binary read and process time: 4515.16129830537

A couple things to note:

1) The 32 MB ascii file took 10x longer to read than the 15 MB binary file.  Not entirely sure why.  Both were read into a buffer.

2) The Binary write takes a long time but I made no effort to optimize this because the plan was to write this file one time only so I don't mind if it takes longer to write this file.  I care much more about how long it takes to read the file because I will be reading it many times.

3) There was a modest gain in converting the ascii file to binary in terms of file size and reading speed.

So big picture... not sure it's worth the effort to convert the files to binary even though most of the data is numerical data in the binary file.  That was actually surprising as I expected there would be more of a difference.  Any ideas on how to get the binary data to read at a faster rate would be great.

 

binary.au3

2019_02_08.zip

Edited by Stew
new version
Posted (edited)

Sorry but this is useless bloatware.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted
5 hours ago, jchd said:

Sorry but this is useless bloatware.

Based upon this naive response I'm guessing you never had to deal with massive amounts of data.  For anyone having to read large data files it is very important to be able to read/write binary.  I have massive amounts of ascii stock price data that takes long periods to read as ascii files.  By converting the files to binary my software can now read them in a fraction of the time it previously required.  

Having seen how many posts there are on these forums for programmers trying to figure out how to read/write binary using Autoit I expect it will be helpful to many others as well.  If not, no problem.  Few photons and electrons were injured in the creation of this forum posting.

Posted (edited)

he is referring to the fact that this functionality already exists in AutoIt proper. BinaryToString and StringToBinary are already an established thing. the Help file also details how to use them. Anyone can write a loop that reads in data and writes out binary with such functions. This was kind of like reinventing the wheel I guess. still, thanks for the effort and, if it works well for you, then you must be doing something right! happy programming.

Edited by Earthshine

My resources are limited. You must ask the right questions

 

Posted

Thanks for your response.  I think a detailed explanation may be helpful.  Let's go through an example that may make it clear the importance of not treating numerical data as a string.

Let's say I have a large number:  2141231231  (10 characters)

I have two options to write that number to a file... as a string of 11 bytes (written either in ascii or binary with a deliminator) or as an Int32 which is 4 bytes (written in binary).  The BinaryToString and StringToBinary will treat this as 11 bytes so the file size is larger but admittedly dependent on the number of characters in the number.  The file size could potentially be smaller if the numerical data on average was made up of numbers from -9 to 99 (remember, still need a 1 char deliminator). 

But that's not the real problem.  After reading in this data as a string you still need to convert this string to a number so that it can be used in mathematical calculations.  The string is of no value for calculations.  That requires several steps with a computational penalty for each step.

For every data point I read in binary as a string I first have to use BinaryToString to get the data in string format, I then have to use StringSplit to separate the large string into individual strings for each number and finally I have to use the Number algorithm to convert the string into a number.  When you have millions of data points this is very slow and unnecessary.  Much better to write the data in binary as an Int32, Int64 or Double and read it in binary as the same data type.  By doing that I eliminate BinaryToString, StringSplit and Number from the process.

For small file sizes this is a non-issue in regards to time savings although I think it's actually easier to read/write these files in the method I outlined than use multiple steps to convert between strings and numbers.  But for large datasets the time savings in reading these files in the way I outlined is significant.  You definitely don't want to read/write numbers as a string whenever dealing with a lot of data.

Hope that helps.

Posted

I didn't misunderstand you--there are other functions to change string to int or floats too before writing out to binary. I understand what you are doing but it can be done other ways, but whatever--you do your thing

My resources are limited. You must ask the right questions

 

  • Moderators
Posted
2 hours ago, Stew said:

Based upon this naive response I'm guessing you never had to deal with massive amounts of data. 

Based on this naive response, you have made everyone on this forum who knows the depth of jchd's technical knowledge laugh.

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Posted

Well maybe jchd's technical knowledge is good but communication skills could use some work.  I'm guessing based upon his/her response that I probably didn't hurt their feelings too much but it's very nice that you stood up for them.  

Earthshine, actually I think there is still some misunderstanding...  I am not converting strings to int or floats before writing out to binary.  My data is already in int and float format in the computer memory as that is the form I need it in to do computations.  I just want to write it out in that format without going through a string in the process as that is an unnecessary and computationally expensive step.  The suggestion to use BinaryToString and StringToBinary would introduce this  unnecessary step.  I want to read/write as Int32, Int64 and Double directly and that is the subject of the code I posted.

If anyone has a better way to read/write numerical data in the most computationally efficient manner possible, please post it here.  My only goal is to minimize computational and i/o time in reading large numerical datasets.  If I'm re-inventing the wheel then I (and I suspect many others) have missed the posting of the wheel somewhere on this forum or in the help files.  Please tell me where you've hidden the wheel.  I'd be thrilled if someone had an even faster way to read/write large files of numerical data than the method I came up with so please post code if you  have it.  I for one would use your code.

Posted

I must apologize for my cryptic answer, hastily writen. It wasn't meant to be rude to you, just pointing out that such code isn't a valuable, efficient solution.

Quote

I have massive amounts of ascii stock price data that takes long periods to read as ascii files.  By converting the files to binary my software can now read them in a fraction of the time it previously required.

I feel this now turns out as a code optimization request, rather than a failed demo that plain vanilla code is inferior.

I gently suggest you open a new thread in General Help forum with a significant sample of input data and your requirements for processing. I strongly suspect there are faster regular ways to process text (ASCII) data than convert the wole thing to binary then extract the wanted pieces out of a large chunk of converted binary.

The goal of most of the fora here is to maximize usefulness of AutoIt-based code. So we're fully spot-on and there is a number of seasonned contributors here willing to provide as much help as possible.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted

jchd, thanks for your comments.  I'm actually not requesting any code optimization.  I'm also not suggesting the plain vanilla code is inferior.  Autoit is an incredible tool as is.  I was just trying to contribute some code that other programmers might find helpful.  I'm certain you could make autoit read ascii files faster with some tweaking but it won't compete with reading the same data in binary format.  It's simply impossible to do when you have to convert one using several steps and you don't have to convert the other.  So I would not submit anything to the General Help forum that I know has no chance at success.  Hopefully everyone recognizes this and won't waste time trying to do the impossible.  Reading numerical data in binary will ALWAYS be much faster than reading it in ascii.

So hopefully this series of blog posts makes things clear.  The sample code I submitted in the first post is ideally suited for instances where:

1) The data files are large

2) The data is primarily numerical

3) The data files need to be read several times

These set of criteria are actually quite common in with people doing research (like myself) and If these criteria are met then it's better to convert the files to binary one time and only read them in binary format going forward.  The sample code I submitted can be used to accomplish this goal.

That being said, the code is incredibly simple and can be used even if the files are not large and are not entirely numerical.  As long as you are careful to read and write the data in the same way it will work for any file and ALWAYS be faster than reading the same set of data in an ascii file.  

Again, I was just trying to help other programmers.  I suspect there will be programmers who need this for their research but even if I'm wrong about that at least I finally figured it out so I can use it in my own research.   I have yet to see a forum post or example code explaining how to read and write all types of data in binary using autoit.  Now there is 1.

Posted

Forgot another application.  If you are not dealing with large amounts of data but do have an application that does a lot of I/O to disk and some of the data is numerical data then you want to do that I/O in binary due to the increased speed.  If the data is entirely string data then you will probably not see a significant change in speed between ascii and binary I/O.

Posted

If/when I have to process a huge lot of numerical data provided in ASCII (for instance gazillions of points and other data from some geodesic source) I build a database from the source.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted
On 2/12/2019 at 6:43 PM, Stew said:

I wrote this sample script to help anyone trying to read and write binary files

I believe this is what you ended up using as a solution for the question of your first posting back in '15
A file is a file that is a file, if that makes any sense. Changing the delimitation format to a more useful one for the case makes sense.
As the above post say, using a standard database may be better.
Standardized formats are peer reviewed and helps oneself from reinventing wheels.
PS: By posting, one clarifies concepts. We all learn at the end. Thanks for sharing your ideas and findings.

Follow the link to my code contribution ( and other things too ).
FAQ - Please Read Before Posting.
autoit_scripter_blue_userbar.png

Posted

argumentum, you are correct.  I haven't looked at this code in some time and decided this week to tackle the problem again and this was the solution.  I have just been reading the ascii files instead.  If you search the forum you will find I am not alone in trying to read/write binary numerical data.  There are a several posts on this topic over several years. 

I have updated the original post above with the full code I use to read the stock price data in ascii, write it in binary and read it back in using binary. I included an ascii file to read in the event anyone wants to mess with the code.

Posted
3 hours ago, Stew said:

If you search the forum you will find I am not alone in trying to read/write binary numerical data

Again, a file is a file that is a file, if that makes any sense. And it usually does not. Even a folder is a file with a "Directory" attribute.
All files are binary. Period. The ASCII table is an agreed standard.

The file you provided is a comma delimited file, that could have been written in a way to know that is a string or number, by encapsulating the strings with " symbols and those not encapsulated would then be numbers. But the file generator did not do this, as is not really needed unless there is a comma in the string.

So if you have a zero, in binary you'd write 0x00 ?, what if it is a real ?. CPUs work with integers. Long story short, your concept of binary read/write and what a character represents, is misconceptualized. If you feel you would welcome help, help is afoot.

4 hours ago, Stew said:

I am not alone in trying to read/write binary numerical data

There are many that are clueless. Read up. Right now, you may not welcome this posting but once you incorporate the proper knowing, it'll make sense.
I write this in humble. I've had your views, before.

Follow the link to my code contribution ( and other things too ).
FAQ - Please Read Before Posting.
autoit_scripter_blue_userbar.png

Posted

Although I'm no Autoit expert, I've been writing complex software for computational fluid dynamics, 3D computer modeling, computer-aided diagnosis, natural language processing, etc. for 30 years.  I've also written code that manipulates bits for 3D modeling, switch endian for reading DICOM files, etc.  So I have a very good understanding of the way data is represented in a computer's memory and stored on a hard drive.  We can get into esoteric discussions about whether data is real, binary, etc., etc. but obviously it's all binary.  Most of us simply refer to the type of data we are working with and I'm primarily working with real numbers (float, double) and integers (int, long) for this project.

If you really want to be helpful then suggest a faster method to do I/O in this application and I'll test your ideas.  Or post some code.  That would be even better!  I don't care if you want to write ascii (text, strings or whatever you want to call it), binary, martian, etc.  I just want it to be as fast as possible getting the data from the hard drive into the RAM and associated with the proper variables.  I acknowledge that shoving all the data into a database for access is probably the best solution but this is just a fun project for me so I want to keep it as simple as possible.  I'm not excited about introducing a database.  

  • 10 months later...
Posted

Here is one quick DICOM image reading folder opening folder reading software. Because of the endian the tags are reversed. If images are not in a subfolder, that is opened instead of the folder.

The idea is to convert the binary parts to hex, and use regex to search and replace.

There is probably a better way to do this, but this has worked for me, hope this helps

DICOMTOOL.png

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...