Jump to content

Recommended Posts

Posted

Hi all,

 

I was pondering over a question with regards to the speeds of reading something and did not see this kind of question in a forum search.

The question: What is (technically) faster? Multiple reads from the same 3d array cell, or only once make a 'temp' variable from that cell and read the value from this? I don't know if either has any real impact at all anyway, but just wanted to ask anyway. :-)

 

There may be a difference if the value holds an integer or a string (or something else) but in my case, is a simple integer.

To hopefully clarify with a small bit of code:

$process = $start - 15
If $xy[$process][3] <> "x" Then
    If _ArraySearch($open, $process, 1, $open[0][0], 0, 0, 1, 1) <> -1 Then
        UpdateOpen($xy[$process][5], $closed[0][0])
    ElseIf $start > 0 And _ArraySearch($closed, $process, 1, $closed[0][0], 0, 0, 1, 0) = -1 Then
        Add_open($start, $closed[0][0], $counter, $process)
    EndIf
EndIf

You can read from this, that the array $closed[0][0] is being read 3 times. And this goes on further in the code I did not show.

My question boils down to this, should I make a 'temp' variable to hold that $closed[0][0] value until the function is done?

 

It may not have a real impact on my small script, but I really am interested in the answer at least.

 

Regards,

Tri.

My active project(s): A-maze-ing generator (generates a maze)

My archived project(s): Pong3 (Multi-pinger)

Posted (edited)

I did a quick perf test, and there does not appear to be a major disadvantage either way from what I can tell.  Assigning a local variable instead of accessing the array is barely faster...and not always based on my test script.

#INCLUDE <ARRAY.AU3>
_ArrayReadTest1()
_ArrayReadTest2()


Local $iTime1 = 0
Local $iTime2 = 0
For $iZ = 1 to 10
    $iValue1 = Random(100,1000,1)
    $iValue2 = Random(1,10,1)

    $iTime1 += _ArrayReadTest1($iValue1,$iValue2)
    $iTime2 += _ArrayReadTest2($iValue1,$iValue2)
Next

$iTime1 = $iTime1 / 10
$iTime2 = $iTime2 / 10

ConsoleWrite("ArrayReadTest1 Average Time: " & $iTime1 & @CRLF)
ConsoleWrite("ArrayReadTest2 Average Time: " & $iTime2 & @CRLF)

Func _ArrayReadTest1($iArrayDimension1=10,$iArrayDimension2=3)
    ConsoleWrite("ArrayReadTest1" & @CRLF)
    ;Create Array for Testing
    Local $aData[$iArrayDimension1][$iArrayDimension2]
    For $iX = 0 to $iArrayDimension1-1
        For $iY = 0 to $iArrayDimension2-1
            $aData[$iX][$iY] = "Blah"
        Next
    Next
    Local $iDim1 = Random(0,$iArrayDimension1-1,1)
    Local $iDim2 = Random(0,$iArrayDimension2-1,1)
    ;Time Measurement
    Local $hTimer = TimerInit()
    ConsoleWrite($aData[$iDim1][$iDim2] & @TAB)
    ConsoleWrite($aData[$iDim1][$iDim2] & @TAB)
    ConsoleWrite($aData[$iDim1][$iDim2] & @CRLF)
    Local $iTime = TimerDiff($hTimer)
    ConsoleWrite("Duration: " & $iTime & @CRLF & @CRLF)
    $hTimer = 0
    Return $iTime
EndFunc

Func _ArrayReadTest2($iArrayDimension1=10,$iArrayDimension2=3)
    ConsoleWrite("ArrayReadTest2" & @CRLF)
    ;Create Array for Testing
    Local $aData[$iArrayDimension1][$iArrayDimension2]
    For $iX = 0 to $iArrayDimension1-1
        For $iY = 0 to $iArrayDimension2-1
            $aData[$iX][$iY] = "Blah"
        Next
    Next
    Local $iDim1 = Random(0,$iArrayDimension1-1,1)
    Local $iDim2 = Random(0,$iArrayDimension2-1,1)
    ;Time Measurement
    Local $hTimer = TimerInit()
    Local $iValue = $aData[$iDim1][$iDim2]
    ConsoleWrite($iValue & @TAB)
    ConsoleWrite($iValue & @TAB)
    ConsoleWrite($iValue & @CRLF)
    Local $iTime = TimerDiff($hTimer)
    ConsoleWrite("Duration: " & $iTime & @CRLF & @CRLF)
    $hTimer = 0
    Return $iTime
EndFunc


So performance wise, I don;t think it matters too much (unless my test logic is flawed).  Depending on the script and code, you may want to assign a local variable to capture a state of the array index if background tasks may/will update the array.  

Edited by spudw2k
Posted

Triblade, what an interesting question.

Under the waters, any array is just an address of where to start finding the values in memory. Say you're storing 32-bit integers in an array, this will be stored in a contiguous block of memory. If we were to look at the memory, it would look like:

base address (start) and then 4 bytes integer, 4 bytes integer, 4 bytes integer, and so on.

If the base address of this array is 50, array index 2 is going to be at 50 + (2 x 4) where 4 is the size in bytes of the integer. In statically typed languages, the number of bytes for your value is determined at compile time so this can be 'baked' into the program to avoid any calculations. In AutoIt, a dynamically typed language, there exists something called a variant (based on 2005 source but probably not changed) which can store any type of data: Int, long, double, window handle, and such small data types easily fit in the variant structure. These are stored sequentially for optimization reasons with a minor overhead. Large values such as strings are stored in another location and the variant simply holds the address to this string memory, so this types of data are exceptional and should be considered carefully when thinking about performance.

Multidimensional arrays get turned into flat arrays, where the indexes are multiplied by one another. So $closed[3][4] is just $closed base address + ((3 * 4) * 4). Again taking 4 bytes for each integer. So you can consider $closed[3][4] to be another syntax for $closed[3*4]. This is really not a special case for the underlying language.

The CPU is incredibly fast. He always copies data from RAM to cache in sizes which are called a block. This is mostly 4K of memory at once copied from RAM, which take about 120+ CPU cycles to get into cache. Modern CPU have 3 levels of cache, called L3, L2 and L1. Depending on manufacturer, these caches have different sizes and this is a large contributor into what makes some modern CPU's feel fast and some slow apart from just raw clock speed.

Accessing arrays requires the CPU to multiply the indexes and then get the whole block of memory into cache. Multiplying the dimensions for your array in your script, 0 by 0, is trivial. This takes only 1 or 2 CPU cycles. Actually retrieving the memory from RAM takes 120+ cycles. Once this memory is in cache, it will take less than 20 CPU cycles to retrieve it again. How do we know if memory is still in cache? This depends on a lot of factors, but the main factor here is your operating system.

Your operating system slices the total CPU time up into what are called slices. Each thread on the operating system gets a time slice according to some prioritization mechanism. Familiarity with this might include opening up task manager, seeing the total amount of threads on the system, and changing the priority of a process to High or Realtime. Now do not be alarmed at the 2000+ something threads running on your system. A vast majority of these threads are in sleep mode, and the scheduler simply skips over them.

Our program will be executing within one of these time slices. This means at the start of the time slice, our CPU will get RAM into cache and will as quickly as possible execute our program. The realistic expected time frame of accesses to the same point in memory should be around 120 cycles for the first access and 20 cycles afterwards. As long as we write our code in tight loops, where we are not voluntarily relinquishing CPU time back to the operating system when we are waiting for something, our data should still be in cache.

Realistically, data still being in cache is determined by a cache eviction policy/algorithm and there is really nothing you can do about this other than not submit to thread switches and perhaps write a nice letter to Intel/AMD where you offer a couple of beers and to sacrifice a goat. Each vendor individually has their own cache eviction policy and I believe in x86 or x86-64 these are not standardized and therefore make for a source of competition for these companies.(Citation needed)

Strings and large memory structures are special. In this case, the variant does not hold the data directly but holds a reference to where the data is really stored. This means the CPU must get the array from memory, which points to other blocks of memory which must also be gotten from RAM. This means not only one of those terrible 120+ cycle accesses, but several, just to access one item. There is really no good solution for this with arrays. Other data types which allow elements to be of non-uniform length might be better suited for the application. This is more or less the same for statically typed languages.

THIS COVERS ARRAY ACCESS.

Now keep in mind that code is just another form of data. The code must also be read into CPU cache before it can be executed. If the AutoIt language makes some bad choices regarding caches; this will severely impact performance and may invalidate some or all of the above. For example, code pointing to other code, to other code, which must all be gotten from cache in sequence is going to make a practical program - which follows all the correct performance guidelines - very, very slow. This is why the community prefers to do benchmarks - but there's nothing wrong with a little theory from time to time.

I believe AutoIt to do this mostly pretty OK in practice. The AutoIt source from 2005 seems to confirm this and that's really the best I can do on this part.

Disclaimer: I've had to dumb this down, and only somewhat reflects the intricacies of modern computing.

Posted
On 3/2/2017 at 4:36 AM, spudw2k said:

I did a quick perf test, and there does not appear to be a major disadvantage either way from what I can tell.  [...]

So performance wise, I don;t think it matters too much (unless my test logic is flawed).

Thanks for the test setup! :)

It seems to gives a small advantage to the read of the variable. See the answer of jvanegmond for a probable cause.

(I did move the setting of the variable before the timestart btw.)

 

10 hours ago, jvanegmond said:

Triblade, what an interesting question.

Under the waters, any array is just an address of where to start finding the values in memory. Say you're storing 32-bit integers in an array, this will be stored in a contiguous block of memory. If we were to look at the memory, it would look like:

base address (start) and then 4 bytes integer, 4 bytes integer, 4 bytes integer, and so on.

Multidimensional arrays get turned into flat arrays, where the indexes are multiplied by one another. So $closed[3][4] is just $closed base address + ((3 * 4) * 4). Again taking 4 bytes for each integer. So you can consider $closed[3][4] to be another syntax for $closed[3*4].

The CPU is incredibly fast. He always copies data from RAM to cache in sizes which are called a block. This is mostly 4K of memory at once copied from RAM, which take about 120+ CPU cycles to get into cache.

Accessing arrays requires the CPU to multiply the indexes and then get the whole block of memory into cache.

Our program will be executing within one of these time slices. This means at the start of the time slice, our CPU will get RAM into cache and will as quickly as possible execute our program. The realistic expected time frame of accesses to the same point in memory should be around 120 cycles for the first access and 20 cycles afterwards. As long as we write our code in tight loops, where we are not voluntarily relinquishing CPU time back to the operating system when we are waiting for something, our data should still be in cache.

Strings and large memory structures are special.

Disclaimer: I've had to dumb this down, and only somewhat reflects the intricacies of modern computing.

* I've cut out a lot of text in the quote because of saving space, and my answers/questions are directly related to the left over sentences. I absolutely did not skip anything.


It may be an interesting question, but an even much more interesting explanation! Thank you for the trouble!

If I understand the gist of it, a variable or an array with integers are both handled the same in that they reside in RAM and get moved for processing to the CPU caches if they are not too large. This means that the slice of CPU that processes the lot have the same amount of cycles needed to get either variable or array cell since they occupy the same cache environment.

The only thing that could explain the 'slower' array speed in the example spudw2k made is the multiplying the CPU has to do to put the array in the cache in the format it needs. I'm 70% sure I've figured this out by dissecting your explanation.

I understand the need for dumbing this down, I had to read this a few times over to process. ^_^

 

Is the gist I wrote about correct?

My active project(s): A-maze-ing generator (generates a maze)

My archived project(s): Pong3 (Multi-pinger)

Posted

Your gist understanding is correct. Except when the array is too large to fit, for example it spans 1 contiguous block of 100MB, this array will just be read into the CPU cache at page-size blocks at a time based on the subscripts you are accessing. This is why sequentially accessing an array is faster than randomly accessing it. The data for the next element will already be in cache.

The test which spudw2k did unfortunately has some problems which make interpreting any meaningful result from it difficult. The array is reinitialized and then accessed immediately after, which can leave the array in the cache. But this depends on CPU make & model. Random numbers are generated in a non-deterministic way, which can affect the outcome of each new test. 10 samples taken is not nearly enough to eliminate random noise, from other programs and the operating system. Probably TimerInit/TimerDiff should be replaced for all such performance measurement scenarios with a straight up call to a QueryPerformanceCounter function, but then DllCall is kinda slow too so it's a lesser of evils decision. There are more problems, none are easy to really properly compensate for.

AutoIt is in this sense not designed with such small performance thoughts in mind. It's better to focus on things we know which will have an affect on performance. Having the data already in cache is referred to as locality of reference and can help in a big way in your initial example. It shouldn't really matter if you copied it into a temporary variable or are accessing it directly, it matters that its in cache or no.

Posted

I understand the principle now at last, that principle is the important takeaway.

My script/arrays is probably never going to be big enough to have to stay in RAM I think, but now I know that because of this thread :P

 

I do think the test could be useful in my specific case, because I intend my script to reset the array and then run again when needed.

 

So, thanks again for the confirmation and interesting material! :D

My active project(s): A-maze-ing generator (generates a maze)

My archived project(s): Pong3 (Multi-pinger)

Posted

How much does a few hundred CPU cycles more or less mean if the AutoIt code interpreter needs millions of CPU cycles to interpret a single line of AutoIt code?

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...