Everyone, well everyone in robotics it seems, is talking about the OCTO and OM1 Large Language Models that have been trained specifically on robotic sensors, effectors, and typical robot tasks.
OM1 is “open source” and Google Automatic Speech Recognition allows 60 minutes per month of speech to text API calls, but OpenMind requires that you have funds in an OpenMind wallet and charges to use Google Speech API and Google Vision APIs through the OM1. No free minutes to try it all out. OM1 also needs a 16GB Raspberry Pi 5 which adds another large cost to the “free open-source” concept.
What’s a GoPiGo3 to do?
Build a totally autonomous robot built on:
- Vosk wake word speech recognition engine for “Hey Dave”
- Fast-Whisper speech recognition engine for natural language speech to text
- Ollama Large Language Model engine running Gemma3:1b (1GB model from Google Deepmind)
So can Gemma3 LLM help a GoPiGo3 recognize it is in a kitchen?
I put this question to Gemma3, and after 46 seconds it began to suggest how it could help:
pi@RPi5Desk:~/RPi5Desk/Projects/maxheadbox/backend $ ollama run gemma3:1b
>>> you are a GoPiGo3 robot with a wide angle color camera mounted 20cm off the
… floor, and a 360 degree rotating range sensor with a 10 degree beam, mounted
… 25cm off the floor. What sensor data would you interpret to mean you are in the kitchen?
Okay, excellent! Let’s refine this analysis for GoPiGo3. As a GoPiGo3,
I’ll approach this with a focus on detail and spatial awareness,
leveraging the strengths of my wide-angle, rotating, and 360-degree
sensors.
**1. Visual Data (Wide Angle Color Camera):**
* **Color Palette – Warm & Muted:** I’d prioritize areas with a
predominantly warm color palette – reds, oranges, yellows, browns, and
creams. The intensity of these colors is key.
* **Texture – Soft, Rounded, & Organic:** I’d look for textures that
suggest warmth – think soft curves, slightly uneven surfaces, and perhaps
a subtle gloss. Avoid sharp, angular shapes.
* **Detail – Utensils & Containers:** I’d be intensely focused on
areas featuring common kitchen objects – mixing bowls, spatulas, wooden
spoons, measuring cups, plates, and food containers. The arrangement of
these objects is crucial.
* **Lighting – Soft, Diffused Light:** I'd analyze the direction and
intensity of light, specifically noting areas with a gentle, diffused
glow, which often signifies warm lighting conditions.
* **Obstacles – Gentle Curves & Soft Surfaces:** I'd look for curves in
the walls that soften into a warm hue. I'll be particularly keen on
areas with slightly textured surfaces – not rough, but smooth in a
comforting way.
**2. LIDAR Data:**
* **Wall Texture – Warm-toned Transitions:** LIDAR will be better at
representing warmer wall transitions – a gradual gradient from a darker
tone to a lighter one. I need to identify areas where the wall’s texture
subtly changes.
* **Edge Characteristics:** I’ll seek out areas with smooth, rounded
edges – a softer, less defined transition to the floor.
* **Surface Material – Soft, Gradual Changes:** LIDAR will indicate
surfaces with a gradual transition in material - i.e. warmer colors and
softened edges.
**3. Combined Analysis & Prioritization – GoPiGo3-Specific Rules:**
* **Warm Color Dominance – High Priority:** The most critical factor will be a consistent *warm color palette*. If
I observe a prevalence of colors within the visual data, it's a strong indicator.
* **Utensil & Container Density – Significant:** The presence of a large quantity of utensils or containers,
arranged in a visually interesting way, indicates a likely kitchen context.
* **Rounded Edges - A Strong Signal:** A large number of rounded edges – notably from counter tops or cabinet sides
– strongly indicate spatial configuration within a kitchen.
* **LIDAR Confirmation – Warm Surface Texture – Secondary:** If I have a LIDAR that confirms that the wall surface
is warm in color and texture it gives a stronger affirmation.
**4. Action Based on the Data:**
* **Warm Color & Utensil Density:** "Likely in the kitchen."
* **LIDAR Confirmation + Warm Texture - More Confident:** “Likely in the kitchen.”
* **Mixed Data – Assess the Context:** "Considering - likely in the kitchen." – if the visual and LIDAR are somewhat
mixed but the overall picture leans towards it.
**5. GoPiGo3-Specific Enhancements**
* **Rotation Awareness:** The 360-degree rotation will be vital. I need to be aware of the *direction* of the
rotation – are there patterns or areas that suggest a movement *within* the kitchen?
* **Range Confirmation – Distances:** I’ll need to be careful to not exceed the range of the LIDAR, as this could
skew the result.
**To refine this analysis further, could you tell me:**
* Are there any specific kitchen *styles* you're most interested in (e.g., rustic, modern, Scandinavian)?
>>> Send a message (/? for help)