GoPi5Go-Dave Upgrades His Eyes

cyclicalobsessive · April 8, 2024, 12:40am

Humble Dave never integrated his kickstarter Oak-D-Lite “eyes”. Create3-Wali “borrowed” the Oak-D-Lite camera which proved to be kryptonite in disguise (Create3 crashed when the camera began publishing its data.)

Things I learned about the Oak-D-Lite through all that:

The kickstarter Oak-D-Lite camera did not have the IMU chip installed
(Create3-Wali had an IMU so was not an issue, but Humble Dave’s IMU failed so WIBNI.)
The stereo depth camera resolution was vastly different from the RGB camera resolution
creating visual details without detailed depth information.
The vertical field of view limited seeing an entire person in a home setting.

GoPi5Go-Dave is getting new “eyes” with almost double the vertical FOV, likewise almost double horizontal FOV, with matched stereo depth and RGB resolution, and it has a 9-dof IMU.

GoPi5Go-Dave should be able to see much more and understand what it is seeing better as well.

KeithW · April 8, 2024, 2:51pm

Cool - what “eyes” will Dave be using?
/K

cyclicalobsessive · April 8, 2024, 3:25pm

When I asked Luxonis about upgrading my $99 kickstarter Oak-D-Lite, (that does not have the IMU), to the production Oak-D-Lite ($149), they responded there is no “upgrade program” and gave me a $60 gift card to apply to purchasing a new camera.

Rather than pay $90 for a second Oak-D-Lite and only really get an IMU, I decided to splurge for the “wide angle - global shutter” Oak-D-W-97 to enable Dave to see his environment pretty close to the volume we see, (albeit at much lower resolution than we do). This ends up being a serious $350 investment in Dave.

I so much want to explore what is possible for an autonomous robot using vision. Localization through recognizing images seen before (RTABmap), visual obstacle avoidance (RTABmap w/Nav2), behavior tree robot reasoning architecture, object recognition of everything “seen” in a home environment (YOLO), and understanding of the purpose of the objects - (RDF triplestore DB), and robot-human dialog about what Dave knows and recently learned (Chatscript/ChatGPT?).

I was so hoping the Create3 would allow me to get started on that dream (rather than the platform). I’m back to working on the platform (GoPiGo3) but hoping to get this all sorted quickly and begin making progress in the dream.

KeithW · April 9, 2024, 11:31am

Sounds like a great project. And certainly the goal of a lot of robotics programmers. But does sound like you’ll need some fairly beefy back end processing power (even if the Oak-D does a lot of the initial processing locally).
/K

cyclicalobsessive · April 9, 2024, 12:05pm

Humble Dave was using 100% of his Pi4 using LIDAR as the primary sensor (around 18KB/sec data rate), and Create3-Wali was using 50% of Pi5 with the Oak-D-Lite(14MB/s) no LiDAR sensing.

By switching to the 13x lower RGB data, 2x higher depth data on the wider view Oak-D-W-97, I am expecting the Pi5 will be up to the job. (GoPiGo3 power supply scares me though. It was not designed with a Pi5 in mind.)

I’m still hoping someone will produce a well tested, well supported $1000 educational robot (Create4 ?) but don’t see it happening for a couple years)

cyclicalobsessive · April 10, 2024, 4:57am

And Dave can SEE more, better:

Perhaps not obvious - 13MP on a smaller area will look more beautiful to human eyes than 1MP on a wider area, but how many of those pixels are actually needed for Dave to understand what he is looking at. The camera is for Dave not us humans.

KeithW · April 10, 2024, 11:38am

Pics look great.

that would be great. Seems like there’s clearly a need - lots of high schools and colleges have robotics programs.
/K

cyclicalobsessive · April 16, 2024, 2:16pm

Back when I was tasked with interactive voice response (IVR) systems with speech recognition, it was often said humans only understand 50% of words when they do not know the context.

In my dream for Dave to understand what he sees, I have been planning to use transfer learning to enhance visual neural nets on objects he has “discovered”, but the name “Grok Vision” in the announcement by Elon Musk’s XAI company reminded me that integrating context into the neural net would likely improve object recognition significantly.

Since the YOLOv4 object recognition program running on the Oak cameras only uses 8% of the RPi5, and the Oak camera can perform the recognition at 30 FPS, there should be capacity for the next generation of vision recognition algorithms when the smart people at Luxonis catch up to Musk.