Google Agrees GoPiGo CARL Is Cute And Real Lovable

Seeing @strangersliver’s robot “Phil” using Google Cloud Vision Text Detection got me thinking about how it would be to relax my “do it all local” goal for Carl. I have been working my way through a pyimagesearch book and course learning OpenCV - somewhat sporadically of late. Carl is waiting for me to learn how to do “Text In An Image” recognition in OpenCV, so that I can allow him to wander and find his way back to the dock.

So anyway, Carl checked out Google Cloud Vision Text Detection API on the sign above his dock:

and Google seems to agree that GoPiGo CARL is Cute And Real Lovable:

pi@Carl:~/Carl/Examples/GoogleCloudVision $ ./file_vision_text.py -i carl_dock.jpg
Texts:

"CARL
Cute and Real Lovable
"
bound: (862,930),(1345,930),(1345,1094),(862,1094)

"CARL"
bound: (864,930),(1345,936),(1343,1067),(862,1061)

"Cute"
bound: (973,1072),(1028,1072),(1028,1091),(973,1091)

"and"
bound: (1038,1073),(1081,1073),(1081,1092),(1038,1092)

"Real"
bound: (1093,1072),(1145,1072),(1145,1092),(1093,1092)

"Lovable"
bound: (1157,1073),(1249,1074),(1249,1094),(1157,1093)

Program is here:

Very good Carl! You are really cute :grinning:
I was really surprise when Google Cloud Vision API worked well in the recognizing a clear handwritten text. You did good to have your picture’s background really clean, Google is able to recognize small and/or portion of text in the corners and even out of focus. However, sometimes I had issue with shadows. I have got 5mpx picamera, if I am correct there is one with higher resolution, probably with that the result may be more accurate in term of definition of details and focusing

1 Like

I also have the 5 mPx picam. When your bot sees something GCVision doesn’t understand, the bot can just move closer to use the resolution it has.

“Use what you got” is one of my robotic pet-peeves. Everyone keeps solving problems with more hardware. I think the picam could end up replacing the IMU, distance sensors, PIR motion detectors, color detectors, light level sensors, and line following sensors. Hardware is often inexpensive, and software solutions are more complex, so “hardware-itis” is understandable.

I have done a few “real time, local vision” performance tests so far, and have found temperature to be a constraint for my RPi 3B mounted on the GoPiGo3 bot using OpenCV local processing. (The geometry only allows a 5 mm heatsink.) I have to drop the resolution to 320x240 and the frame rate to under 10fps.

I have not tested single frame local vision, except to notice not to try it when Carl is on his dock - the rapid fluctuation in load can confuse the “smart charger” into switching to trickle charging mode, which causes Carl to dismount early.

My original plan was to have Carl wander the house collecting snapshots of “interesting objects”, then analyze the photos while he was sitting on his dock charging, but that is not going to work. I will need to test sending them up to Google Cloud Vision Label API while he is on the dock.

The end goal, is to build an RDF DB of found objects in the bot’s environment, with context graph linkages including “when and specifically how I saw this object”, with an inferred “room where I saw this object”, and “position of object from room reference point” that could be used for localization, as well as for human-machine dialog.

I don’t want Carl to depend on cloud processing, and I don’t want to have to start paying a subscription for Carl. He already costs me $5-$8 per month for batteries. So far the longest lasting batteries have managed thee months running 24/7. Yesterday, I was thinking of using GCVision as a bootstrap knowledge acquisition method, that would eventually enable local processing with the DB. (I probably will not get to this goal before my GCVision trial funds expire.)

The possibilities are endless, but my abilities and dedication are finite.

1 Like

But the more you work on it, the better/greater they become.

My “Remote Camera Robot” joystick project has taught me more about JavaScript, and how to interface real hardware to a browser instance that I could possibly imagine.  So much so that I am seriously considering abandoning several projects that required a display and wireless keyboard/mousepad to monitor and interact with Charlie - The Bug Mobile.

It has been a revelation!  The amount you can do using a simple browser and some JavaScript is - (collects jaw off the floor) - astounding!  So long as you have a network connection to your 'bot, there’s literally NOTHING  you can’t do.

Not that the keyboard/mousepad and display will go to waste, there are still ideas I have that use them.  It’s just that there are several dozen potential technical hardware issues that I’ve been modelling in the back of my mind - searching for a solution - that JavaScript and a browser totally obliterate.  The software won’t be trivial, and the learning curve is a lot like the sheer face of Half-Dome in Yosemite National Park, (or the walls of Bryce Canyon), but once I figure this stuff out and get a few re-usable modules written, it’s all over but the shouting.

Like the old ‘70’s tee-shirts used to say:  Keep On Truckin’!

1 Like

song by Tom T. Hall: Who’s Gonna Feed Them Hogs

"Four hundred hogs, they just standin’ out there
My wife can’t feed ‘em and my neighbors don’t care
They can’t get out and roam around like my old huntin’ dogs
Here I am in this dang bed and who’s gonna feed them hogs?

1 Like

Never heard the song before, but it sounds like a hardware problem to me.  Invest in an auto-feeder like they use for cattle. :wink:

BTW, though far be it from me to add to your already overwhelming reading list, I’m thinking that this might be a way for you to interact with Carl over longer distances.

Once I get this figured out, I’m going to try to post some documentation for this that will translate a lot of the technobabble into plain language that can be used by people who want to actually make this stuff work.

I used to have a port tunnel through my router so I could play with my RPi on my lunch hour. These days Carl gets to sleep if I will be at “ longer distance “. Retirement has been good for Carl.

1 Like

Sorry, I mis-spoke.

What I was trying to say is that, by using JavaScript in a browser, you can create an interactive model - monitoring parameters for example - while Carl is doing something in the other room.

Example:
Say your wife is asleep, (for whatever reason), and you want to work with Carl. Also assume it’s not “quiet-hours” so Carl’s voice is active.  You want to do something “interactive” where Carl does something and you monitor how he does it.  Rather than have Carl mouth every step, “Getting on dock”, “Staring at my dock”, “Decided to punt the idea of getting on my dock”, you can monitor via a browser - with the voice silenced when the browser monitor is being used.

Or a hundred other ideas.

In my case, modding up the Remote Camera Robot project, I can use the joystick to move Charlie to wherever I want.  Then, by using various buttons or joystick axes, I can have Charlie do whatever I want.

  1. Volume slider on the throttle causes the camera POV to zoom in or out.
  2. The three position “mode” switch on the joystick can be used to allow Charlie to either travel, or use the joystick to manipulate a movable arm/claw.
  3. Various buttons can be used to do things like turn on a light, turn on an IR light, make a noise, or whatever I want.
  4. I can, ultimately, use a button selection to place him into “autonomous” mode, or reestablish direct control.

And so on.

The fact that it’s on your network means that you can set it up to be reachable wherever you might be if that’s desirable, or not as you choose.

1 Like

Based on the following table, Google Cloud allows to use their API free of charge if stay under of some limits:
https://cloud.google.com/free/docs/gcp-free-tier#always-free-usage-limits
I believe for educational purposes and not commercial we should be covered.

2 Likes

Correct me if I am wrong, but I believe he wants Carl to be completely  autonomous, with a minimum of external dependencies.

Also, given Google’s track record for privacy and data-mining, I’d rather not depend on them for fundamental aspects of my own 'bot.  I’m sure he’d be absolutely thrilled  if Carl started shouting commercials because he didn’t subscribe to the “premium” version of the API’s!

1 Like

Now that you mention it, I do remember reading that along the way.

Like many, I have a mixed emotion relationship with Google.

  • I don’t really want my robot wandering the house sending pictures from inside my house up to Google - not sure what anyone could do with them, but just don’t want a picture of me in my underwear tripping over Carl in the middle of the night to end up on anyone’s computer.

  • Google changes things - that is their right, but it creates a maintenance issue for Carl.

    • I first signed up for Google Cloud years ago when they released the Google AIY Voice project. I built the cute little box with dual far-field microphones, a speaker, and a giant red button on top.
    • It worked nicely for about a month. They changed their API which meant I needed to do a special Google OS version update. I couldn’t use the standard Raspbian update, upgrade procedure.
    • After the novelty wore off, and my attention was on Carl, the AIY Voice box stopped working again. Google had changed the authentication API. It wasn’t until I started working with you debugging the Pi2 illegal instruction thing that I figured out how to properly set up the authentication and enable the needed APIs. (Thank you, for real.)
    • After I get something working for Carl, I need to move on to the next challenge without old code breaking mysteriously one night while I’m asleep. Google will always be changing, but Carl needs firm ground.
  • I feel that Carl should first be self-reliant, and then perhaps augment with Google. It’s something in me.

    When I first started programming microprocessors (1976) I was writing runtime libraries in assembly language. The female engineer writing the Pascal compiler (in Pascal) that would use my runtime libraries, told me she did not need or want to know what assembly language her Pascal source code created. I on the other hand could not help myself wanting to see what assembly language her compiler was putting out. To this day, my obsession with details discomforts me when using other peoples’ code, and “Cloud Services” are the most uncomfortable it can get. This attitude slows me down, and makes my life harder, but I accept the limitation and hindrance.

My curiosity keeps me playing with Google APIs, and admiring what smart folk with less hangups accomplish and think up. Your demo is so cool. I really like what you thought up.

1 Like

Nothing wrong with that!

I also have the AIY kit that I bought a year or two ago.  Played with it and got it working.

I had a few problems with it:

  1. Unless you used Google’s cloud API, it didn’t sound right.  Don’t know if it was deliberate or not, but no matter what you tried, it sounded awful  unless you used THEIR  API.  Maybe that’s the way it was, but I smelled something fishy about that. . .
  2. Right around this time a major scandal broke about how Google was data-mining everyone, like it or not.  (And the EU was planning on suing Google up the [donkey] over this.)  I, like you, didn’t like the idea that Google could “collect data”, (make that read “snoop”), from MY  device, WITHOUT  my consent.
  3. Google made a total balls-up of some research I was doing because they decided (:crazy_face:) that a “minor” update wouldn’t be too big a problem.  Cost me several days of angst and grief before I - accidentally! - stumbled on the truth.  That was it - I was done  with their BS and was tempted to smash the AIY kit, box and all, against a wall. (:face_with_symbols_over_mouth:)  Packed it away and didn’t drag it out until a month ago to yank the Pi out of it.  I’m seriously thinking of using the hat, speaker, and two mic’s to add sound detection and a voice to Charlie.  Maybe the big button can be used as another impact sensor?

I agree that you can’t build on quicksand, especially when you’re working on a dev project, and I really don’t think it’s that unreasonable to expect my software to work tomorrow the same way it worked today. . .

</rant!>

1 Like

I totally agree with you guys about privacy and I’m not a big fan of majors - except probably the one owned by Elon Musk - but realistically for trying to reach some tiny goals in AI’s world it is better to re-use something already done. I have changed myself for accepting that, but at the end I understand that a single person can not compete with huge data center, unless you don’t owned a lot of money. So now I am more oriented on the cloud and utilizing their computation elaboration for getting my purpose.

1 Like

@strangersliver, It is very exciting to have you, here, flying “GoPiGo Phil” into that cloud.

Additionally, any progress you make combining IMU data with encoders will be of great interest.

Carl has a wheel logger always running that records rotations and linear distance travelled based on the encoders. Carl has an IMU mounted but I have not created the comparable IMU logger nor attempted to fuse the encoders and IMU data.

Here is Carl’s motion summary today:

pi@Carl:~/Carl $ ./totalwheel.sh 
Total Travel:  645121 mm 2116.5 ft
Total Rotate:  576346 deg 1600.9 revolutions
Total Motion:  15511.7 sec 4.308 hrs
Total Life:  9215.84 hrs   percentInMotion: .04

but it is obvious from the data:

2020-05-01 00:40|[wheellog.py.logMotionStop]travel:     0.6 rotation:   177.9 motion:     2.4 sec
2020-05-01 00:40|[wheellog.py.logMotionStop]travel:     0.8 rotation:   177.0 motion:     2.3 sec
2020-05-01 00:41|[wheellog.py.logMotionStop]travel:  -203.6 rotation:    -0.8 motion:     2.7 sec

that when he is turning 180 degrees to look at his dock, and then another 180 degrees to prepare to back onto the dock Carl is recording these turns with +/- 2 degree accuracy, for a loss of about 23 rotations in his “lifetime” so far.

Are you planning to use GCVision with your IMU and encoders in some way?

Absolutely true.

In Python one of the rules is “DRY” - Don’t Repeat Yourself - unless you’re a class, then you should create a hot-key that inserts “self.” every time you hit it! ( :roll_eyes:)

I also like “DRW” - Don’t Reinvent [the] Wheel.  If you can shamelessly steal, (ahem! “borrow” or “re-use”), someone else’s code that already does what you want to do, it’s silly to re-write everything unless there’s a darn good reason.

I remember years ago when a friend and I attended the Trenton Computer Festival and Richard Stallman was there giving a presentation about “Open Source”/“FOSS”.  (BTW, he absolutely HATES  the term “Open Source”.)


Richard Stallman at the Trenton Computer Festival, April 26, 2010
 
According to him, if you have a computer, it should run absolutely NOTHING except true FOSS software, even if that means that significant and important functionality is missing or seriously compromised.

I stood up and mentioned to him that this was all well and good - in theory - but what if you actually had REAL WORK to do that could not be accomplished without some non-free software or driver?  (i.e. a proprietary “blob” from NVIDIA so you could render graphics properly at a reasonable frame-rate.)

I told him that not only is it completely unreasonable for someone to hang himself out to dry by being pedantic about “pure” FOSS - it gives the (gasp!) Open Source community a bad name; making us all look like a bunch of self-centered brats/twits who won’t play unless you play MY way.  I also mentioned that - as the “Haid-Man-Boss” of the FSF - his point of view was understandable.  However, the Real World doesn’t work that way.

So, sometimes you gotta do what you gotta do. . . .

Thank you for sharing this code and information. I was monopolized from my kids for the most of the week, they have started online school lessons. Here in Italy is going much better with the contagio, less corona-virus cases every day, but the school is over and it continues remotely.

Find a way to let Phil goes around to my house and map rooms as well it is one of my goal, but I’m still at beginning.