Some may have watched my “vision sensing” progression with:
- Easy PiCamera Sensor Class for GoPiGo3 robots (motion detect, color detect, light intensity and position) without OpenCV
- Raspberry Pi Camera feeding personally developed OpenCV routines (after taking a $100 class)
- AprilTags with PiCamera using OpenCV
- TensorFlowLite Object Recognition on the GoPiGo3 processor with Pi Cam
- TensorFlowLite ArUco Recognition on the GoPiGo3 processor with Pi Cam
- Husky Lens Camera with fast, but very limited object recognition models
- Oak-D-Lite KickStarter - Stereo Depth and Color Camera with a Neural Net processor - $200
- ArUco Fiducial Markers using the Oak-D-Lite neural net processor running TensorFlowLite models using video from Oak-D-Lite Color camera
- Oak-D-Wide - Wide Angle Stereo Depth and Color Camera with a neural net processor - $350
trying to get to 3D vision mapping and localization running on a Raspberry Pi 5 on my ROS 2 GoPiGo3 robot.
Along the way, Raspberry Pi announced two levels of AI HATs to mate with the Raspberry Pi 5, confirming that vision and large language models really need their own dedicated processor, (and continuing the distributed processing robot architecture that “real robots” need).
(Actually, it appears “real robots” need a human wearing 3D googles as one of the distributed processing units - Ref: the $20K Neo robot just announced for early purchase signup).
The recently announced OpenMind OM1 “robot mind” requires its own dedicated 16GB Raspberry Pi5 (or better a 16 core Intel chip with an NVIDIA RX7090), a separate vision large language model, separate general knowledge large language model, separate natural language speech recognition engine, and separate speech-to-text engine to process the separate camera, separate microphone, and separate speaker to command the separate “smart” mobile robot platform with commands like “move grabber left” and “turn right a little”.
And what all the hype about large language models have shown is that they need a “model-context-protocol” processor (MCP) to translate the world to large language model input and the LLM output to real world actionable commands.
SO WHAT IS THIS POST REALLY ABOUT?
How does it relate to the GoPiGo3?
The GoPiGo3 is a great platform for learning:
- the basics of robot control,
- interfacing to sensors,
- operating systems, APIs (Application Programming Interfaces),
- “real time” response versus “time sliced” limitations,
- multiprocessing/multi-threading with communication/synchronization,
- programming and programming languages
- human - robot user interfacing
and with the new $75 Huskey Lens 2 “camera with built-in 6TOP AI processor and MCP” the GoPiGo3’s Raspberry Pi 3 or Pi 4 processor has all the power needed to do vision controlled navigation, person recognition and following, ArUco marker recognition, and even feed that off-board LLM if desired.
This $75 vision sensor packs so much functionality and so many possibilities for learning about modern robotics, that folks can skip the entire brute-force approach I went through over the last seven years.
DISCLAIMER: I do not have any affiliation with the company, and I don’t get anything if you click on the link. I am just excited to hear what wonderful applications folks can come up with for the GoPiGo3 with a HuskyLens2, and don’t want folks to waste time and GoPiGo3 processing cycles installing and learning OpenCV to recognize ArUco markers.
BTW, GoPiGo OS includes lessons on using TensorFlowLite with the PiCamera on the GoPiGo3’s Raspberry Pi for object recognition. This is the same underlying technology that these new camera-plus-AI_processor devices use, so I do recommend doing these GoPiGo OS lessons if your robot has the PiCamera (Model 2 camera version).