Robot Carl's First Words (my new GoPiGo3 based bot)

After giving up (at least for the moment) on modernizing my 18 year old robot Pogo, I decided to build robot Carl based on the GoPiGo3.

Here is Carl posing for his first portrait:

Carl_the_GoPiGo3_robot

and here is Carl in action, saying his first words:
8 second 4k video of Robot Carl’s First Words

If you want to see where I left off on “old” Pogo, here is a video of Pogo imitating a Great Egret (think a lot about moving, then move a little and think some more)

pogo:
image

That looks really great. Love the minion (Carl) :slight_smile:

Do you plan to integrate that with Alexa at some point?

And the way your Pogo robot was talking made me think of Dave, the alien-operated human-looking spaceship in Meet Dave - it did it the same way: thinking about moving, moving a little and then think some more - that’s kinda funny actually.

I started my career (40 yrs ago) not even wanting an OS - terrible NIH syndrome. Today, it is totally inefficient to ask even a 4 core 1.4GHz 1GB (split) memory “brain” to perform every feature I want in my bot, but I would like to maximize the local capabilities and then layer some cloud capabilities as non-required, enhanced abilities.

Integrating speech recognition (one of my career specialties), and using visual object recognition to enable self managed learning and self battery charge maintenance are my robot bucket list.

I’ve done some pocketSphinx testing, and need to learn to used OpenCV.

My plan for OpenCV starts with implementing one of the simple Braitenburg vehicles using evaluation of left and right average intensity of periodic images.

Later, I want to integrate some form of assisted learning of “home objects useful to a home robot” (collect periodic images with situational information while wandering, “notice” similar objects in the collected images, ask human for identification, expanding an RDF object DB). Perhaps the object recognition load needs to be split between local and cloud.

I feel that Carl needs a minimal level of intrinsic speech recognition, so I’m not rushing into Google Voice or Alexa. It will be a challenge to design a good man-machine speech interface for Carl under the heat and computational limits.

I’m hoping to use sound level measurement, periodic and sound triggered image interpretation to trigger running the speech recognition engine for short dialog sessions, then returning to low power consumption mode.

The Raspberry Pi 3B/B+ can do a pretty good job of limited speech reco, and non-real-time visual object recognition if allowed to focus most of its resources on one major task at a time. The Dave/Egret model will need to be Carl’s operational and interface modes.

I had high hopes for the brain transplant into the RugWarriorPro, but it ended up being three years of fiddling with hardware problems. I think building over the GoPiGo3 will allow me to get started on my robot bucket list.

And the immediate need is to familiarlze myself with the distance sensor and tilt/pan servos.

My first distance sensor experiment is going to be

  • recognize a wall (three distance measurements in a line),
  • approach wall to “wall following distance”
  • Look left and right for a “corner”,
    – if corner found, move into it, and orient looking out on the dissecting angle
    – if no corner, wall follow [left | right | continue] , looking for corner
    • [turn left or right to parallel]
    • measure safe movement distance,
    • point sensor 90 deg to body to do the “follow” movement for safe distance
    • point sensor parallel again and look for corner

We also want to add some sort of example projects into the GoPiGo3/DexterOS environment that involve computer vision (with OpenCV, PiCamera and Pillow).

As you said, the Raspberry Pi 3 (B+) has very limited computing power and with that you’ll only be capable of creating a very basic speech recognition system. Nowadays, if you really want to have something powerful, you’ve got to go into the realm of deep learning, but that involves lots and lots of data that only a big company can get its hand onto. Fortunately, they’ve already got trained networks, so you can just take them and apply them to your scenario - you can even take a pre-trained network and further train it for your particular use-case.

Training a network on your own is something quite unfeasible - you need dozens of GPUs and lots of time to do the training. So the best thing is to not reinvent the wheel and take something already developed - this way, you’d be able to offload more to the Raspberry Pi and less to the cloud.

I had high hopes for the brain transplant into the RugWarriorPro, but it ended up being three years of fiddling with hardware problems. I think building over the GoPiGo3 will allow me to get started on my robot bucket list.

Are you actually referring to using actual mice neurons to build a network? 'Cause I had the intention to do this a couple of years ago - to make a GoPiGo3-like robot to drive itself within a maze with actual neurons. The problem were the tools/equipment I would have needed to buy, the fact that I would’ve needed to repurpose the room and the lack of time.

the Raspberry Pi 3 (B+) has very limited computing power and with that you’ll only be capable of creating a very basic speech recognition system.

When the new Pi 3B came out, with 4 cores, I did a formal speech reco comparison against the single core Pi B+. I was quite surprised and encouraged by the results.

The full report is on the element14 Community site

My github repository contains the software and result logs.

Pi3RoadTest - Speech Recognition using PocketSphinx on Raspberry Pi3

(Two years later, my wife is still laughing and reminding me that my bot “Pogo” heard “What’s the weather look like?” as “What’s the weather long quiet?”)

PogoSpeechRecoTest

Summary Test Result:

  • Speech recognition using large language model with unconstrained audio file input on the Pi 3 is 2.4 times faster than on the Pi B+ with identical error rate.

  • Speech recognition using small language model with in-model microphone input on the Pi 3 is 3.72 times faster than on the Pi B+. The Pi 3 had a zero (and sometimes near zero) word error rate, while the Pi B+ showed 37% word error rate.

  • Speech recognition using a medium size grammar with in-grammar microphone input on the Pi 3 had zero errors, while the Pi B+ showed a 3% WER. (The version of pocketsphinx used does not report performance with grammar-based recognition.) Both processors appeared to keep up with the commands in real time.

Are you actually referring to using actual mice neurons to build a network?

No, but I did implement a simplistic multi-layer software neural net in 16kB of InteractiveC to drive my Pogo bot a long time ago.

My first vision project will treat a camera image as two distinct light intensity sensors. It may not even need OpenCV; may be that numpy will be better to output intensity values in the range [0,10].

Braitenberg Vehicles was a really cool book on the premise that very simple and direct sensor-motor connections can produce vehicles that can exhibit recognizable synthetic “emotions”.

Here is a great presentation about the various vehicles by Chris Thorton in 2013.

I’m thinking to have a CLI interface to Braitenburg vehicles:
2a) fear,
2b) aggression,
3a) love,
4a) bi-polar neurotic, and
4b) Puppy Love

Each connect the dual-light-intensity-value-sensor-from-picamera-image to the motor speed values in different manner.

Fear direct connects left to left and right to right.

Aggression cross connects left sensor to right motor speed, right sensor to left motor speed.

Love uses inhibitory connections instead of stimulus.

Bi-Polar combines aggression and fear based on intensity. The vehicle should initially turn toward the light source and move close, but at some intensity turn away. If the switch from aggression to fear is abrupt the bot will be bi-polar and neurotic. (I surely don’t want a neurotic robot running around for very long.

I think if the transition is blended the bot should be able to end up running tight circles around the light source like an puppy running excited around its owner.