If your robot needs more CPU power from Python,

rolly · March 6, 2017, 12:57am

… your running it on a Raspberry Pi 2 or 3, and there are parts of your workload that you can cleave off to work in parallel - there’s something really neat you can do!!!
Use Python’s multiprocessing support to make use of the 3 extra CPUs to quadruple your CPU power.
Documentation showing all the powerful ways the application parts can communicate across their separate CPUs is here.
I’ve uploaded a simple example. As the example illustrates, variables used across the separate CPUS must always be used in a more formal way (shared memory).
I hope you find this useful.
multiCpuTest.py (2.2 KB)

graykevinb · March 6, 2017, 4:12am

Thanks!

Shane.gingell · March 23, 2017, 4:27am

This is something that I had planned to do with my fast moving computer vision robot.

The plan was to use 3 cores

First core was to operate motor control
Second core was to process the next image while the motor was carrying out the last motor command
Third core was to be handling the I/O of the next frame from camera memory to computer memory.

In theory I should be able to do all 3 of these process parallel atm I am doing them in series. It is a case of me getting my head around how to write the code to achieve this.

Would be this how it would work?
I def my image array as a shared memory
I def my motor power as shared memory
I def a routine to read the image data from the camera
I def a routine to operate the motors
I then have my main program start up both routines to run in parallel
proc = Process(target=watchTheTime, args=(timeLastSaw,)) #Consume another cpu - TRAILING COMMA NEEDED
proc.daemon = True
proc.start()
Then I can just have my main program processing the image array from the shared memory and writing the power back to the shared memory

rolly · March 23, 2017, 2:28pm

Make sure your application lends itself to this usage. Unless cpu1 is acquiring the current image, CPU 2 is processing the previous image, and CPU 3 is carrying out the motor commands from the previous image processing (or some such), I doubt you have a useful multi-CPU application.
The web page I referenced has other forms of inter-processor communications besides shared memory (like queues. worker pools, etc.).

Shane.gingell · March 23, 2017, 10:09pm

Yes this is my plan. Each process will be on a different image.

The motor control is at the end of the production.

The image processing part is on the next frame getting it ready in advance for the motor control

The I/O of the frame from camera to computer memory part is on and next frame again in front getting it ready for the processing part.

So the whole thing works like a production line in a factory.

With a bit of research I found this command “thread.start_new_thread ( function, args[, kwargs] )” it seems the most basic easiest way for me to start up each of the 2 extra processes. It says it will run separate processes but with shared memory.

rolly · March 23, 2017, 10:27pm

Careful to distinguish between threads (which are sharing the current CPU), and processes (which use the other CPUs, and truly execute in parallel).

Shane.gingell · March 23, 2017, 10:54pm

When you start a new thread python won’t run it on a separate CPU ???

Shane.gingell · March 23, 2017, 11:04pm

Ok that I just did a bit of reading and now understand the difference.

Threading uses same memory space so has to use Global Interpreter Lock to ensure 2 threads aren’t trying to write to same memory space at the same time.

Multiprocessing uses separate memory space so don’t need a Global Interpreter Lock and can truly run in parallel but the down side is to data between the 2 is harder.

It seems the more complex multiprocessing will achieve the spped increase that want much better.

Shane.gingell · March 24, 2017, 12:40am

@rolly Now one more important question.

Now I won’t have a problem with 2 separate processes writing to the same memory as each process passes data down the line and none pass it back up the line i.e then camera I/O process writes the image array and the image processing process reads the array then processes it the it writes the motor power output variable and then the motor control process reads the power out variable and runs the process of controlling the motors based up the read variable.

Now that multiprocessing doesn’t have GIL will it be a problem that the camera I/O process will be writing the camera frame array and the imaging processing process can be reading the frame array at the same time? Now the frame array is a NumPy array which I believe has some low level GIL handling.

Shane.gingell · March 24, 2017, 12:55am

OMG it seems just to keep getting more complex the more than I read.

It seems that threading maybe best the the Camera I/O as it doesn’t use the CPU but rather blocks the CPU and makes it stand still until the I/O process is over. So by using threading for the Camera I/O it allows the CPU to do the processing and at the same time as the camera I/O is happening and provides a GIL for the array read/write.

It maybe best to use multiprocessing for the Motor control process as it requires CPU power but little data sharing that should be ok without GIL.

rolly · March 24, 2017, 1:13am

If you create your shared variables with the Value call, and read or write them with the .value method (as in the sample program I included at the start of this thread), you should be OK. Numpy is not going to use .value, so copy what it’s using to a shared array.Put a lock on the shared frame array so its consumer in the other CPU won’t be reading it when its not completely full.

Shane.gingell · March 24, 2017, 1:51am

It will take me a bit of time to write the code to time and test different methods but I am going to try 2 different things.

1st doing what you suggest having 3 processes. Creating a shared array and copy from NumPy array to shared array then lock when reading/writing but obviously this has 2 disadvantages 1. your adding another operation (copy the array), 2. The camera I/O process is always in a state of writing, as soon as finishes writing then it writes again so this may complicate the locking and unlocking sync for the image processing process to read the array.

Please correct me if I am wrong as I am learning on the fly here. But my understanding of it is the advantage of multiprocessing is that it utilizes more cores (CPUs) i.e if your current process is using 100% of your CPU you get more speed by putting some of the CPU process onto another CPU. Threading allows multi operations to happen on the same CPU so if that CPU is at 100% then threading won’t speed it up but if your CPU isn’t bogged down by CPU processes and is just standing idle while waiting for and I/O process to finish then using threading will speed up as it will allow the CPU to do the computing process while the I/O process is happening in the back ground.

**2nd thing to try ** spawning a thread to be doing the I/O from camera to NumPy array then make 2 .value shared variables left_motor_power and right_motor_power and set both = 0 then spawn multiprocess for motor control then start main loop that processes the NumPy array from the thread then updates the shared variables for the motor process.

rolly · March 24, 2017, 2:18am

I don’t mean to set myself up as an expert on this topic. I have a BrickPi+ robot with a lidar that is continuously streaming data (range and bearing over 360 degrees) in over USB. I set up a CPU to just receive and decode the data. Sounds similar, although less onerous, than your video stream. Yes using another CPU can free up cycles for the main application’s CPU, but it also can allow the extra CPU to give its full attention to say your video stream.
This does come at a cost of complexity communicating between the CPUs.

Shane.gingell · March 24, 2017, 3:13am

Good to know that your playing with lidar as it is on my radar to have a play with at some stage, would be interesting to combine computer vision and lidar together and maybe even with AI

The speed (FPS) can be easily monitored by taking start_time at beginning then have counter that counts loops around the process then take finish_time when finishing. total_time = finish_time - start_time then FPS = counter / total_time

This is what I have found about computer vision. Your only as fast as your slowest bottle neck, I was expecting the processing of the image to be the slowest part. After timing and calculating FPS I now know motor control is by far the slowest part but also what surprised me is that CPUs r now getting so fast that the processing of the image can out run the I/O of the image transfer from the camera to the computer memory. So the CPU processing of the image is the fastest part of the all the processes. The I/O is much slower and doesn’t use the CPU so increasing CPU speed won’t help with the I/O bottle neck. Multiprocessing increases CPU speed but creates more I/O bottle necks between processes. Threading doesn’t increase CPU speed but does allow multi threads to run at the same time on the 1 CPU. Looking onn other forums about computer vision they use threading to speed up I/O bottle necks because in their words I/O operations act like a blocking of the CPU in idle till there finished but if on a separate thread the the CPU isn’t blocked and the CPU can then be used for image processing i.e the I/O operation and CPU operations can happen at the same time.

I think motor control is on the other hand and doesn’t require a lot of I/O from main program and would suit using Multiprocessing to run on its own core.

Shane.gingell · March 25, 2017, 9:01am

@rolly
This may interest you Rolly for the development of your robot.
I have had a little time tonight to sit down and break apart my program into section and time each section doing 1000 loops but leaving out the motor control.
First the whole 3 parts of program : I/O from Camera then process of the image then another I/O of data to computer screen. speed was 50FPS
Second I just grabbed an image then just sent the same image each time without updating the image for processing then I/O to the screen. speed was 240FPS
Third I just grabbed an image then sent same image around for processing but didn’t I/O send result to screen. speed was 550FPS
Fourth I just I/O frames from camera to computer but didn’t do anything with them. speed was 56FPS
Fifth I played with camera settings and kept running the I/O of frames from camera to computer untill I found peak camera settings for speed. Speed was 71FPS

As u can see doing just the CPU processing without any I/O it was 550FPS but I/O from camera peaked at 71FPS and I/O to the screen for the user to see dropped the speed from 550FPS to 240FPS. So this tells me that I/O is a far bigger bottle neck than CPU processing.

Shane.gingell · March 25, 2017, 9:06pm

Now here is some more info after test using threading to try to solve I/O bottle neck.
I could get max Frames out of my camera of 71FPS when just grabbing frames and doing nothing with them.
I could get max frames out of the whole process(grab frames ,process and write to screen) of 50 FPS
When putting the I/O of grabbing frames on it’s own thread then the processing of frames went very high but of course it is out running the I/O of the frames so it ends up processing the same frame more than once as the I/O hasn’t updated so then I added a global flag call read_frame and once frame processed it sets it to true then when I/O updates a frame it sets it to false and I only process a frame when global flag is false so that I can only process new frames. This gave me a frame rate of only 36FPS lower than my orginal 50FPS when not using threading. I then thought maybe the GIL is blocking the I/O operation and I need to have the I/O in main program and the processing in second thread to give priority to the I/O thread but again the result was 36FPS so both ways threading lowered by I/O process but speed up my processing of the image if I didn’t make it wait for new frame but of course this is just silly to be processing frames that have already been processed. In the end the I/O process wasn’t speed up in fact it was slowed down.