Jump to content

Parallelism


JPCare
 Share

Recommended Posts

 

Autoit was not designed for parallel processing.

 

Nevertheless, it is possible to program the execution of parallel processes via AutoIt.

 

To do so, you have to specifically design you program in two parts :

- Part 1 : initialization, launching of parallel processes, consolidation of results

- Part 2 : process to be executed in several independent instances

How possible ?

--------------------

1/ The Run instruction of AutoIt can be used to launch a Part 2 process on a given cpu thread, directly via the Windows start command with the /Affinity parameter or via an utility program for the same function.

 

Obviously, the Part 2 launched processes must be compiled exe files, as they must include all necessary libraries for their independent executions.

 

Examples

Run(@ComSpec & " /C start /affinity "&$Affinitymask&" proc.exe")

By repeating the instruction with different hexadeciimal affinity masks in $Mask, you will launch proc.exe parallel instances.

 

There is another way with an utility program that nicely accepts a cpu id (from 0 on) as a parameter instead of an affinity mask :

$slaunch="StartAffinity.exe proc.exe "&$cpuid

Run($slaunch)

 

2/ Each parallely launched Part 2 process must be started with its specific execution environment, ie at least its cpuid of execution, either via parameters within the Run instruction ($CmdLine) or via a few envset instructions just before each Run instruction, to be mirrored inside each process by envget declarations of globals (the launching instructions of parallel processes are executed sequentially !).

 

3/ Just before termination, each launched Part 2 process writes its results into a specific file (for example a file named after its cpuid).

 

4/ After having launched the Part 2 parallel processes, the Part 1 main program waits for the appearance of the file of results when created by each terminating Part 2 parallel process (these files have been swept off before the launching loop) to consolidate the results.

 

 

Experimental proof of concept

---------------------------------------

Two AutoIt scripts are provided : Divisoptkim.au3 (Part 1) and loopkim.au3 (Part 2).

Aim : find the integer divisors of an integer number.

The loopkim.au3 MUST BE COMPILED to exe by the AutoIt compiler on your OS.

Place the files, Divisoptkim.au3, loopkim.exe (plus the StartAffinity.exe file if your OS is Windows XP) in the same directory, and from there start Divisoptkim.au3.

(no compilation needed for Divisoptkim provided that the au3 extension is associated with AutoIt on your machine).

 

The loopkim processes will run in several independent instances if you entered a sufficiently big number.

Good numbers for testing : 5040, 66049, 13 444 333 222 110, 133 444 333 222 110, 333 333 333 333 333.

Do not hesitate trying the proposed big numbers (ca 10-15 seconds execution time on a dual core processor), otherwise the execution time might be less than a hearbeat.

 

The graphical display of Windows Task Manager (Resources Monitor) will show what happens on each cpu load.

 

Note that the software internally redefines the level of parallelism for "too small" numbers. To do so, it automatically reduces the number of allocated cpus so that each one gets a reasonable task (minimum length of the segment of candidate numbers).

When executions terminate, as many Journal_x.txt files are (re)created as the number of parallel executions of loopkim.exe, with x = cpuid (from 0 on).

 

A standard run with default parameters will use all allocated cpus and potentially reach the 100% load level on the allocated range of cpus. If you select the second allocation method (if you have at least 2 threads), only the even cpus (0,2...) will be allocated, and the average load on the range of allocated cpus will potentially culminate at a 50% level.

 

Warning. This demo software was tested on basic desktop machines with Intel cpus on various configurations, from XP Home / dual core to Window 11 / 20 cpus. Changes in the code may be necessary in other environments.

 

For XP. A utility is required in the Part 1 process launch instruction, as the XP start command does not accept the /affinity parameter.

At least two utility programs are available.

StartAffinity can be downloaded from http://www.adsciengineering.com/StartAffinity/.
Another (bigger) tool can be found in the Heise software archiv site : search for "launch".
Both utilities seem to work perfectly on recent Oses, although developped many years ago.

If your OS is XP, the Part 1 script checks that StartAffinity is in the execution directory.

 

All Oses. If you load a copy of StartAffinity into the execution directory, the software will use it to launch Part 2 processes instead of the Start command.

 

Note. Had to care for really big numbers (Divisopkimv2) by rejecting sizes longer than 15 digits. AutoIt will automatically use 64 bit long integers and shift to real format under uncontrollable circumstances, which is no longer compatible with calculations as programmed in my demo software. Will later develop a version with the BigNum library... 05th may, well, no, I definitely won't add anything to the existing demo software... The tested BigNum lib proved of poor value compared with my own hand written procedures for long division and mod...

 

Efforts to extend a language beyond the limits of its original design create monsters. The demo software stands within these limits, nevertheless reveals how some "impossible features" can be implemented in AutoIt as it stands. If you wish easier ways, there are many other languages that implement parallelim in their original designs.

 

 

Divisoptkimv2.au3 loopkim.au3

Edited by JPCare
One more instruction in Divisoptkim to exclude too large numbers
Link to comment
Share on other sites

@JPCare 

Not sure why you would need a extra tool for this, while the command is natively available in Windows ...

Open the cmd prompt and type start /?

Quote

AFFINITY    Specifies the processor affinity mask as a hexadecimal number.
                The process is restricted to running on these processors.

                The affinity mask is interpreted differently when /AFFINITY and
                /NODE are combined.  Specify the affinity mask as if the NUMA
                node's processor mask is right shifted to begin at bit zero.
                The process is restricted to running on those processors in
                common between the specified affinity mask and the NUMA node.
                If no processors are in common, the process is restricted to
                running on the specified NUMA node.

Or you can open the Task Manager go the any running processes and right click to set the Affinity

image.png.f9874a16a195588dff1409abcbedd082.png

 

As far as I understand Afinity has nothing to do with parallel processing ?

https://en.wikipedia.org/wiki/Processor_affinity

Quote

Processor affinity, or CPU pinning or "cache affinity", enables the binding and unbinding of a process or a thread to a central processing unit (CPU) or a range of CPUs, so that the process or thread will execute only on the designated CPU or CPUs rather than any CPU.

 

Link to comment
Share on other sites

Yes the start command and the affinity mask are the most handy ways to launch parallel processes.

The proposed demo software works INTERNALLY exactly so.  It finds the integer divisors of an integer N, by launching THE SAME SEARCH PROCESS in several cpus for parallel execution, after having allocated a specific range of integers to each process, then waiting for the results of the parallel processes for consolidation - this is demo software, the search algorithm is not optimized, the aim is showing how true basic parallelism can be done.

There is no point in recalling the possibility to define the affinity of running processes in the Task Manager. The idea is different : directly launch each parallel process on its own cpu by an AutoIt script.

This is how I understand "parallelism" : it has to be controlled by my own software. To do so, you need a special design of your software as explained in my message, which goes way beyond the affinity mask in the Run instruction.

As far as I know, no way to do so has been previously published with an example how to solve a real problem - I mean not just to display a "hello" message.

No need to care for the NUMA things and the sophisticated technical details, a simple affinity mask does it. The demo software was tested on several versions of the Windows OS and several cpus, from XP to Win 11, from simple dualcore to 20 threads.

In AutoIt, this demo software experiments a double "exploit" : true parallelism + intensive number crunching ONLY WITH BASIC INSTRUCTIONS.

Multithreading and Inter Process Communications (IPCs) are "implemented" in their most basic meanings. Multithreading = launching the same process on several cpus. IPC = waiting for completion of each parallel process after creation of its own file of results.

Yes, this is very basic, but it is the way to directly control the loading of each cpu, whereas multithreading and IPCs work in a virtually limitless powerful environment.

I do NOT claim that Multithreading and IPCs are superfluous, it all depends on what you want to do !

Edited by JPCare
Link to comment
Share on other sites

  • 1 month later...

A few more experiments....

1/ Updated the main AutoIt script to v3 with timer, so that you no longer need a stopwatch

Found a Task Manager http://www.mitec.cz/tmx.html that provides easier access to individual cpu loads.

 

2/ Developed Loop.go in Go language, as a "ported" version of the AutoIt script, but according to the strict programming rules of the Go language,

and using the goroutine feature for parallelism.

 

3/ Found the go version much faster than the AutoIt one, by a 300 factor or more. For example, 10.6 s gets down to 24 ms ie 0.024 s in Go. And the goroutine mechanism loads all the cpus, but to make it visible in the task manager graphics, you have to add instructions that will create a sustantial load, for example an internal loop at the end of TheLoop, incrementing one local uint64 variable from 0 to 10 000 000 000 step 1.

 

End of experiment. Lessons are obvious.

 

AutoIt is OK for developping short scripts according to its original philosophy. Also for prototyping difficult routines in projects !

 

June 17th. Modified Loop.go for regular use of the compiled version. The original one was ok under Geany but had no care for the Windows console...

June 21th. Cosmetic modifications of the Loop.go.

August 20th. Code cleaning : "final" versions, divisoptkim.au3 and loopkim.au3 (the one to be compiled)

 

 

 

 

 

Divisoptkim.au3 loopkim.au3

Edited by JPCare
Link to comment
Share on other sites

  • 1 year later...

 

Eventually expanded the experiment to other languages, ie Python, Go, C++_CUDA

Link to my profile in researchgate, providing access to article and coded samples, including a revised version of the AutoIt program. https://www.researchgate.net/profile/Jean-Philippe-Carillon Abstract of the article by Claude (AI generated) "

Here is a summary of the key points from the document:

 

The document describes experiments with parallel computing in various

programming languages and environments. The main goal is to test simple and

truly parallel solutions for finding all divisors of an integer number.

Parallel solutions are tested on CPU threads and GPU threads.

 

The document looks at parallel solutions in Python, Go, Go with BigInt

package, AutoIt, C++, and CUDA. It shows how to implement parallelism even

in languages like AutoIt that lack native support for it.

 

Key findings:

 

- Execution time decreases hyperbolically rather than linearly with more

CPU threads. After 50% CPU utilization, marginal gains are small. This

confirms Amdahl's law.

 

- Compiled code is not always faster than interpreted code, especially when

using specialized math packages. Sometimes there is little difference.

 

- Virtual machines show no slowdown for computational tasks compared to

native hardware.

 

- GPU parallelism does not automatically beat CPU parallelism. Benefits

depend heavily on the computation. For this integer factorization problem,

CPU solutions were faster.

 

- Adding "smarter" optimizations like filtering tests provided little

benefit over a basic brute force algorithm. Interpreted languages were

slowed down by more complex code.

 

- The OS competes with and does not properly support true parallelism in

user programs. Its design goals are different - load balancing and resource

sharing.

 

The document challenges some common assumptions about performance and

parallelism through real tests. It also surfaces some open questions about

OS design, software complexity, and attitudes around innovation.

"

 

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...