MoonDream Vision Language Assistant on GoPiGo3 Robot Kilted-Dave

Couldn’t resist trying MoonDream vision language assistant on Dave and Wali.

Now WaLI sports an 8GB Pi5 and Dave only has a 4GB Pi4, but why not.

Here is the pic:
Alan_WaLI_and_Dave

So first we ask Dave “What do you see?” (using the MoonDream 0.5b model 693Mb)

(moondream_006_venv) ubuntu@kilteddave:~/KiltedDave/systests/moondream/examples_with_API_006$ ./see_wali_and_dave.py 
Using moondream-0.5b model with moondream 0.0.6 Python API
Model Load Time: 19.13 seconds
Image Load and Encode Time: 36.69 seconds
Query Time: 55.68 seconds

What do you see?

The image features a man sitting next to a table with a collection of Lego toys.

There are various Lego models and a small camera on the table,

as well as a sign that reads “TBS-WORLD.”

The man appears to be posing for a photo with the Lego items,

showcasing the fun and creative aspects of the Lego hobby.

ubuntu@kilteddave:~/KiltedDave/ros2ws$ ./status.sh 

********** Humble Dave 2 ROS2 GoPiGo3 Status ***********
Saturday 02/21/26
 22:30:03 up  1:39,  3 users,  load average: 4.07, 1.49, 0.74
temp=68.6'C
frequency(48)=1800404352
throttled=0x0
GoPiGo3 Battery Voltage: 12.3 volts

               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       2.6Gi       115Mi        17Mi       1.1Gi       1.1Gi
Swap:             0B          0B          0B

So using the half billion token moondream model on a 4GB Raspberrry Pi 4:

  • Used 100% of the Pi 4 (all four cores)
  • Used 1.9GB memory
  • Took 19 seconds to load the model
  • Took 37 seconds to “encode” the image
  • Took 56 seconds to answer “What do you see?”

Now turning to TurtleBot5-WaLI with the 2 billion token moondream model on an 8GB Raspberry Pi5:

  • Used 100% of the Pi 5 (all four cores)
  • Used 5GB memory
  • Took 21 seconds to load the (2.04GB) model
  • Took 29 seconds to “encode” the image
  • Took 41 seconds to answer “What do you see?”

and what it said:

The image features a man sitting next to a robot, which is placed on a table.

The robot is positioned in the center of the table, while the man is situated on the left side.

The robot appears to be a small, possibly toy-like, model.

==============

Now let’s see what Dave can do if we give him the hint:

In this picture there is a man and two robots. Describe the robots.
Query Time: 37.45 seconds

The image shows a man and two robots.

One of the robots is a TARDIS, a type of robot that resembles the character from the television series Doctor Who.

The other robot is likely a TARDIS-themed robot, but it is not possible to confirm their exact relationship or association with the man in the image.

====

Seems like moondream hallucinates a well as Ollama - not sure how useful this “Vision Language Assistant” would be for Dave.