ROS2 vision object detector

Dear all,

If you are looking for a vision object detector (it’s MobileNet) integrated into the ROS2 system I have something that may be of interest:

GitHub link to ros2_coco_detector

It is applicable to anyone who has any of the Dexter Industries products (and others) who are running ROS2 on it, have a camera fitted, and are interested in object detection.


Julian, this is very good - even better than that!

I also was seeing issues with uncompressed images, and ended up only publishing compressed jpeg at 0.1 Hz to minimize net load.

Additionally I love your:

The ROS2 documentation suggests that the ObjectHypotheses.class_id should be an identifier  
that the client should then look up in a database. 
This seems more complex than I have a need for. 
So this implementation just places the class label here directly, eg. class_id = "dog".

On my Create3-WaLi robot, I am running the Luxonis DepthAI ros node which broadcasts only the numerical class id. I hate that. I have to remember did this launch file configure MobileNet 20 objects or YOLOv4 80 objects and then look up the class_id number which is different for the chair/person/bicycle/bird/table/sofa typical objects in my home.

Bird and bicycle are test images I put in front of the camera with the robot in the other room and then run back to the desktop to catch the detection class and look up the number. One night I stupidly left the Bird/Bicycle/Person page with images in front of WaLi and went to bed. My wife woke me up at 3:30AM telling me WaLi was in distress. Because the paper had fallen to cover part of WaLi’s dock, he actually pushed the dock a meter from where it was taped to the floor, and was half climbing the wall trying to get onto the dock. He left rubber on the floor spinning his wheels against the wall.

As for detection thresholds, I find your values for MobileNet are similar for my setup, but the YOLOv4 detections rarely get above 50-60% confidence for positive detections.

Google supposedly has a cloud recognizer for “Home Objects” but I certainly don’t plan to send any images from inside my home across the net. My “end game dream” is that my robot will wander the house with VSLAM and record a few images for later object analysis. Then when sitting on the dock recharging, he will run a custom “My Latest Home Object Detector” removing any recognized object from the image. Then run some sort of an “Uknown Object Segmention Net” on the remaining image, and fire off an email to me with a unknown image segment, to which I could reply to the email with a label, WordNet morphology and relations. Or more complex and totally unsecure would be for WaLi to run Google Image Search, but I kind of like the idea of WaLi asking me for knowledge.

Eventually I would like to have a conversational interface (I did speech recognition and speech generation for a living prior to retirement) so I could ask “how many known objects are in your knowledge base?” and “how many objects are awaiting classification?” since I would want to limit the “What is this?” emails in my inbox each day. It would be interesting to ask stats such as “What object has the most detections?”, “What object did you see yesterday with the least detections?”, “What object has the most relationships?”, What objects have relationships shallower than x deep?" (If I added a display to Wali, he could detect when I come near, and ask if I have time to help him classify a few images, and then display an unknown. The problem though is having unknown objects in the speech recognition vocabulary, and lacking a context the word error rate is around 50% anyway, so the email will probably remain. Problem though is email leaves the house with the image.)

I’m not familiar with Robostack - seems like another area for investigation I need to add to my TODO: list.

Good job on your detector - I like it a lot.

1 Like


Thank you for your comments. I had a look at Luxonis DepthAI and was looking through their documentation, it seems to suggest that the object recognition software is for the OAK cameras only. Is that correct? Or does it work on any camera?

I am also very interested in VSLAM. Around October/November I spent some time trying to get ORB_SLAM3 to work. I did get it all to work (as in compile and run), and I also (eventually) got it to work with ROS2. However I did not get good results. I could not make any meaningful sense of the PointClouds and it didn’t seem to localize well. I am not that familiar with this type of VSLAM, so who knows, maybe I made some poor configuration decisions.

I saw a post you wrote some time back about whether it would be possible to use visual detections to localize the robot. I have been thinking along the same lines (since my ORB3 failure).

About 10 years ago, I had a robot that would drive over a world map (the map was preprogrammed and given to the program) and just using a colour sensor would localise itself on the map. The only information the robot had was blue or yellow (sea and land respectively). Of course initially it couldn’t possibly know where it was (it would quickly establish if it was on land or sea), but over time all the sequences of colour readings would resolve into there being only a small number of locations which are consistent with that sequence of readings. I received a lot of comments from people who were surprised at how well it worked given how little information was being received at each time step. There is a youtube video of this: HMM Localization The sound quality is really dreadful, you may want to turn the volume right down. (A neighbour’s bird was squarking…)

I have been thinking of a more modern version of that, ie an object detector can provide a lot more information about location. This would not be SLAM, I am thinking the locations of objects would have to be preprogrammed. I have some code that works in a very limited fashion that suggests I might be on the right track, but nothing demo’able at the moment. It is not trivial (at least for me), so it might be a multi month project.

If you are wondering what the cat and dog posters are doing in that youtube video I published, I purchased them just before Christmas, and they are part of the … localisation system… I’ve got sidetracked since on polishing up the brickpi3 and coco_detector packages.

Sorry for long post, but hope it may be of interest.


Yes, the Oak-D-Lite has three cameras and a processor that can run several net pipelines simultaneous. Depth from twin mono cameras is output for every pixel, and you can run object recognition on the RGB camera all within the device.

I am still learning how to configure the ROS driver / Oak-D-Lite but I did actually once! get a 3D pointcloud overlayed on the RGB image in RViz2 using RTABmap. ( My Create3 robot is brought to its knees when I run RTABmap on the Pi5, so I am having to learn how to isolate the Create3 from unneeded FastDDS topics.)

RTABmap seems to be the winner for open-source VSLAM and it does not consume all the Pi5 processor, so I think I will be able to actually use it on the bot if I can get the RMW middleware configuration to stop killing my Create3.

RTABmap work by collecting an image tree as the bot moves and then if it notices a closure can begin doing localization and correction of the map.

The concept of localizing by recognized objects on a map does not have an out of the box package yet - I think the VSLAM concept is solid, but I think the object detection may require auto-updating object detectors. Supposedly RTABmap has a image feature extractor as its “object detector”.

1 Like


Just a reply to Cyclical’s question on RoboStack. It’s really just a set of Conda packages to install the ROS2 software. I can go into more details if you wish, but if you have an installation process for your software stacks that you are happy with, I don’t think it will add any value. There’s no new functionality. Personally I use it as I have found it quite complex getting several software stacks setup correctly (and not fighting with each other…).

The repository’s that I write don’t depend in any way on RoboStack, they’re just a suggested way to get valid installations of dependencies like ROS2 and PyTorch.

I have a question on RTABmap, but I’ll post on a separate topic.

1 Like