Can Gemma3 LLM help a GoPiGo3 recognize it is in a kitchen?

Everyone, well everyone in robotics it seems, is talking about the OCTO and OM1 Large Language Models that have been trained specifically on robotic sensors, effectors, and typical robot tasks.

OM1 is “open source” and Google Automatic Speech Recognition allows 60 minutes per month of speech to text API calls, but OpenMind requires that you have funds in an OpenMind wallet and charges to use Google Speech API and Google Vision APIs through the OM1. No free minutes to try it all out. OM1 also needs a 16GB Raspberry Pi 5 which adds another large cost to the “free open-source” concept.

What’s a GoPiGo3 to do?

Build a totally autonomous robot built on:

  • Vosk wake word speech recognition engine for “Hey Dave”
  • Fast-Whisper speech recognition engine for natural language speech to text
  • Ollama Large Language Model engine running Gemma3:1b (1GB model from Google Deepmind)

So can Gemma3 LLM help a GoPiGo3 recognize it is in a kitchen?

I put this question to Gemma3, and after 46 seconds it began to suggest how it could help:

pi@RPi5Desk:~/RPi5Desk/Projects/maxheadbox/backend $ ollama run gemma3:1b
>>> you are a GoPiGo3 robot with a wide angle color camera mounted 20cm off the
… floor, and a 360 degree rotating range sensor with a 10 degree beam, mounted
… 25cm off the floor. What sensor data would you interpret to mean you are in the kitchen?

Okay, excellent! Let’s refine this analysis for GoPiGo3. As a GoPiGo3, 
I’ll approach this with a focus on detail and spatial awareness, 
leveraging the strengths of my wide-angle, rotating, and 360-degree 
sensors.

**1. Visual Data (Wide Angle Color Camera):**

*   **Color Palette – Warm & Muted:** I’d prioritize areas with a 
predominantly warm color palette – reds, oranges, yellows, browns, and 
creams. The intensity of these colors is key.
*   **Texture – Soft, Rounded, & Organic:** I’d look for textures that 
suggest warmth – think soft curves, slightly uneven surfaces, and perhaps 
a subtle gloss. Avoid sharp, angular shapes.
*   **Detail – Utensils & Containers:**  I’d be intensely focused on 
areas featuring common kitchen objects – mixing bowls, spatulas, wooden 
spoons, measuring cups, plates, and food containers.  The arrangement of 
these objects is crucial.
*   **Lighting – Soft, Diffused Light:** I'd analyze the direction and 
intensity of light, specifically noting areas with a gentle, diffused 
glow, which often signifies warm lighting conditions.
*   **Obstacles – Gentle Curves & Soft Surfaces:** I'd look for curves in 
the walls that soften into a warm hue.  I'll be particularly keen on 
areas with slightly textured surfaces – not rough, but smooth in a 
comforting way.

**2. LIDAR Data:**

*   **Wall Texture – Warm-toned Transitions:** LIDAR will be better at 
representing warmer wall transitions – a gradual gradient from a darker 
tone to a lighter one.  I need to identify areas where the wall’s texture 
subtly changes.
*   **Edge Characteristics:** I’ll seek out areas with smooth, rounded 
edges – a softer, less defined transition to the floor.
*   **Surface Material – Soft, Gradual Changes:** LIDAR will indicate 
surfaces with a gradual transition in material - i.e. warmer colors and 
softened edges.

**3. Combined Analysis & Prioritization – GoPiGo3-Specific Rules:**

*   **Warm Color Dominance – High Priority:**  The most critical factor will be a consistent *warm color palette*.  If 
I observe a prevalence of colors within the visual data, it's a strong indicator.
*   **Utensil & Container Density – Significant:** The presence of a large quantity of utensils or containers, 
arranged in a visually interesting way, indicates a likely kitchen context.
*   **Rounded Edges - A Strong Signal:** A large number of rounded edges – notably from counter tops or cabinet sides 
– strongly indicate spatial configuration within a kitchen.
*   **LIDAR Confirmation – Warm Surface Texture – Secondary:** If I have a LIDAR that confirms that the wall surface 
is warm in color and texture it gives a stronger affirmation.

**4. Action Based on the Data:**

*   **Warm Color & Utensil Density:** "Likely in the kitchen."
*   **LIDAR Confirmation + Warm Texture - More Confident:** “Likely in the kitchen.”
*   **Mixed Data – Assess the Context:** "Considering - likely in the kitchen." – if the visual and LIDAR are somewhat 
mixed but the overall picture leans towards it.

**5.  GoPiGo3-Specific Enhancements**

*   **Rotation Awareness:** The 360-degree rotation will be vital. I need to be aware of the *direction* of the 
rotation – are there patterns or areas that suggest a movement *within* the kitchen?
*   **Range Confirmation – Distances:** I’ll need to be careful to not exceed the range of the LIDAR, as this could 
skew the result.

**To refine this analysis further, could you tell me:**

*   Are there any specific kitchen *styles* you're most interested in (e.g., rustic, modern, Scandinavian)?

>>> Send a message (/? for help)

Definitely TMI for Kilted Dave at this point in his life.

1 Like

I still think that simple landmarks, (like an ARCO/etc. tag), are the best and simplest way to indicate things like that.

Actual location recognition, (i.e. This is the living room), requires a larger level of intelligence than the GoPiGo might be able to handle.

Viz.:

  1. Recognition of a space as “Living room” even though the layout, (and perhaps even the decor and/or furniture), may change.

  2. Recognition that the “Living room” is a different but related space within a larger bounded space “house”.

  3. (Extra credit) Recognizing “living room”, “kitchen”, and “house” as generalized constructs that can have different configurations and be in different places.

This is a non-trivial task that has challenged even those with bottomless budgets and Ph.D. level scientists on staff.

IMHO, we should be looking to simplify instead of trying to make the GoPiGo into a DARPA open-field autonomous battle-bot. :wink:

1 Like

Vision indeed! There are many paths to “understanding”, and many more possibilities when accepting cloud computing as the understander.

Some of these LLMs have “knowledge” that when told (by the vision interpretation VLMVila cloud processor) there is a side-by-side refrigerator/freezer (with through the door ice dispenser) seen, that the “home robot” is likely in the kitchen, and are even able to suggest “move commands” to put the robot in front of the fridge. Of course what I have thought of as the robot, becomes only a mobile sensor/effector platform with its head in the clouds, but kilted-Dave (and TB5-WaLI) has all the abilities needed to serve the new masters of the digitally connected universe.

ps I do not have the desire to learn how to create my own free versions of these costly commercial cloud services, but for what they do offer for free is tiny “knowledge capsules” that are fun to imagine use cases.

I got to the point that I realize my dreams are inconsistent with my abilities, and the GoPiGo3 processor options

1 Like