MoonDream Vision Language Assistant on GoPiGo3 Robot Kilted-Dave

Couldn’t resist trying MoonDream vision language assistant on Dave and Wali.

Now WaLI sports an 8GB Pi5 and Dave only has a 4GB Pi4, but why not.

Here is the pic:
Alan_WaLI_and_Dave

So first we ask Dave “What do you see?” (using the MoonDream 0.5b model 693Mb)

(moondream_006_venv) ubuntu@kilteddave:~/KiltedDave/systests/moondream/examples_with_API_006$ ./see_wali_and_dave.py 
Using moondream-0.5b model with moondream 0.0.6 Python API
Model Load Time: 19.13 seconds
Image Load and Encode Time: 36.69 seconds
Query Time: 55.68 seconds

What do you see?

The image features a man sitting next to a table with a collection of Lego toys.

There are various Lego models and a small camera on the table,

as well as a sign that reads “TBS-WORLD.”

The man appears to be posing for a photo with the Lego items,

showcasing the fun and creative aspects of the Lego hobby.

ubuntu@kilteddave:~/KiltedDave/ros2ws$ ./status.sh 

********** Humble Dave 2 ROS2 GoPiGo3 Status ***********
Saturday 02/21/26
 22:30:03 up  1:39,  3 users,  load average: 4.07, 1.49, 0.74
temp=68.6'C
frequency(48)=1800404352
throttled=0x0
GoPiGo3 Battery Voltage: 12.3 volts

               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       2.6Gi       115Mi        17Mi       1.1Gi       1.1Gi
Swap:             0B          0B          0B

So using the half billion token moondream model on a 4GB Raspberrry Pi 4:

  • Used 100% of the Pi 4 (all four cores)
  • Used 1.9GB memory
  • Took 19 seconds to load the model
  • Took 37 seconds to “encode” the image
  • Took 56 seconds to answer “What do you see?”

Now turning to TurtleBot5-WaLI with the 2 billion token moondream model on an 8GB Raspberry Pi5:

  • Used 100% of the Pi 5 (all four cores)
  • Used 5GB memory
  • Took 21 seconds to load the (2.04GB) model
  • Took 29 seconds to “encode” the image
  • Took 41 seconds to answer “What do you see?”

and what it said:

The image features a man sitting next to a robot, which is placed on a table.

The robot is positioned in the center of the table, while the man is situated on the left side.

The robot appears to be a small, possibly toy-like, model.

==============

Now let’s see what Dave can do if we give him the hint:

In this picture there is a man and two robots. Describe the robots.
Query Time: 37.45 seconds

The image shows a man and two robots.

One of the robots is a TARDIS, a type of robot that resembles the character from the television series Doctor Who.

The other robot is likely a TARDIS-themed robot, but it is not possible to confirm their exact relationship or association with the man in the image.

====

Seems like moondream hallucinates a well as Ollama - not sure how useful this “Vision Language Assistant” would be for Dave.

Looking In A Mirror

See yourself in a mirror is the foundation of developing a “self”, so what will MoonDream see when asked “What do you see?” :

Image of Dave with 0.5b model on 8GB Pi5

The image features a collection of electronic components, including a camera, a fan, and a fan, all attached to a wall. There is also a poster or a sheet of paper with text on it, which is partially visible in the image.

And Dave with 4GB Pi4 and 0.5b model “looks at himself”

(moondream_006_venv) **ubuntu@kilteddave**:**~/KiltedDave/systests/moondream/examples_with_API_006**$ ./see_dave.py

Using moondream-0.5b model with moondream 0.0.6 Python API
Using image of Dave

Model Load Time: 17.65 seconds
Image Load and Encode Time: 74.60 seconds

What do you see?

Query Time: 24.72 seconds

The image features a collection of electronic components, including a small electronic device, a power supply, and a circuit board. The components are displayed in a clear plastic bag, which is placed on a table.

Me: Yes, I need to put Dave in a plastic bag and place him to rest on a shelf

Image of Dave with 2b model (only Pi5) !!YES!!

The image features a robot with a Minion head, which is a Minion character from the popular movie franchise. The robot is positioned on a white surface, and it appears to be connected to a power source. The robot is also adorned with a variety of wires, which are likely used for its various functions and components.

Image of WaLI with 0.5b model

The image features a large, black, dome-shaped device with a speaker and a screen. The speaker is located in the middle of the device, while the screen is positioned towards the top. The device appears to be a speaker with a dome-shaped structure, designed for audio and audio purposes.

Image of WaLI with 2b model on Pi5

The image features a robot with two large eyes, sitting on a tiled floor. The robot has a black and yellow color scheme, and it appears to be a toy or a model of a robot.

Interesting

It is interesting that the 2b model recognized a Minion character but not the WALL-E character in-spite of WaLI having the name right in front.

The 2b model recognized both Dave and Wali are robots.

In some cases moondream running on the 8GB Pi5 and 4GB Pi4 returns the same result, but not always.

Google Gemini Nails It With Flattery (ain’t no Raspberry Pi talkin’)

  === GoPiGo3 Robot Kilted-Dave ===

That is quite the impressive DIY robotics build! It looks like a custom-made mobile robot, playfully dubbed “Kilted-Dave” based on the sign in the background.

Here is a breakdown of what I see in your setup:

The “Brain” and Body

  • The Minion: Sitting atop the “throne” is a Minion figure (Dave, presumably), giving the bot a great personality.
  • Tiered Chassis: The robot is built using a multi-level clear acrylic (or polycarbonate) chassis. This is a classic maker design that allows for easy mounting of sensors and boards while keeping the wiring visible.
  • Drivetrain: It features two yellow motor-driven wheels at the front (differential drive), likely supported by a caster wheel in the back for balance.

Visible Sensors and Electronics

  • LiDAR Unit: Just below the top platform, there appears to be a 360-degree rotating LiDAR sensor. This is likely used for SLAM (Simultaneous Localization and Mapping) so Dave can navigate without bumping into your furniture.
  • Stereo Depth Camera: Mounted on the front is a wide sensor bar (it looks like an Intel RealSense or similar depth camera), which provides 3D vision and object recognition.
  • Microcontrollers/SBCs: I can see green status LEDs glowing from inside the stack, suggesting a powerful Single Board Computer (like a Raspberry Pi or Jetson Nano) acting as the primary controller.

The Navigation Aids

  • AprilTag/Fiducial Marker: The black-and-white square pattern on the “Kilted-Dave” sign is a fiducial marker. Robots use these as “landmarks” to precisely identify their location or a specific docking station.
  • Docking Area: It looks like the robot is currently positioned near a charging station or a designated “home” base on a floor mat.

This looks like a serious robotics project—possibly running ROS (Robot Operating System) given the complexity of the sensor suite.

Are you currently working on autonomous navigation or mapping with him?

well, lucky you! You now have a TARDIS! Where you going with that???

1 Like

Yeah - #VanLife - Somewhere and some dimension without the anomaly of this place. I just have to change one more line in Dave’s programming and I’m sure I’ll have it working.

1 Like