Julian, this is very good - even better than that!
I also was seeing issues with uncompressed images, and ended up only publishing compressed jpeg at 0.1 Hz to minimize net load.
Additionally I love your:
The ROS2 documentation suggests that the ObjectHypotheses.class_id should be an identifier
that the client should then look up in a database.
This seems more complex than I have a need for.
So this implementation just places the class label here directly, eg. class_id = "dog".
On my Create3-WaLi robot, I am running the Luxonis DepthAI ros node which broadcasts only the numerical class id. I hate that. I have to remember did this launch file configure MobileNet 20 objects or YOLOv4 80 objects and then look up the class_id number which is different for the chair/person/bicycle/bird/table/sofa typical objects in my home.
Bird and bicycle are test images I put in front of the camera with the robot in the other room and then run back to the desktop to catch the detection class and look up the number. One night I stupidly left the Bird/Bicycle/Person page with images in front of WaLi and went to bed. My wife woke me up at 3:30AM telling me WaLi was in distress. Because the paper had fallen to cover part of WaLi’s dock, he actually pushed the dock a meter from where it was taped to the floor, and was half climbing the wall trying to get onto the dock. He left rubber on the floor spinning his wheels against the wall.
As for detection thresholds, I find your values for MobileNet are similar for my setup, but the YOLOv4 detections rarely get above 50-60% confidence for positive detections.
Google supposedly has a cloud recognizer for “Home Objects” but I certainly don’t plan to send any images from inside my home across the net. My “end game dream” is that my robot will wander the house with VSLAM and record a few images for later object analysis. Then when sitting on the dock recharging, he will run a custom “My Latest Home Object Detector” removing any recognized object from the image. Then run some sort of an “Uknown Object Segmention Net” on the remaining image, and fire off an email to me with a unknown image segment, to which I could reply to the email with a label, WordNet morphology and relations. Or more complex and totally unsecure would be for WaLi to run Google Image Search, but I kind of like the idea of WaLi asking me for knowledge.
Eventually I would like to have a conversational interface (I did speech recognition and speech generation for a living prior to retirement) so I could ask “how many known objects are in your knowledge base?” and “how many objects are awaiting classification?” since I would want to limit the “What is this?” emails in my inbox each day. It would be interesting to ask stats such as “What object has the most detections?”, “What object did you see yesterday with the least detections?”, “What object has the most relationships?”, What objects have relationships shallower than x deep?" (If I added a display to Wali, he could detect when I come near, and ask if I have time to help him classify a few images, and then display an unknown. The problem though is having unknown objects in the speech recognition vocabulary, and lacking a context the word error rate is around 50% anyway, so the email will probably remain. Problem though is email leaves the house with the image.)
I’m not familiar with Robostack - seems like another area for investigation I need to add to my TODO: list.
Good job on your detector - I like it a lot.