Any American Sign Language GoPiGo3s?

cyclicalobsessive · March 22, 2023, 9:48pm

The GoPiGo OS has a mobilenet object recognition TensorFlow lite demo using the PiCamera on the GoPiGo3.

Anyone know if there is a TFlite model that recognizes American Sign Language spelling? Recognizing ASL words or phrases would involve multi-frame recognition, but almost all of the letter signs are static hand positions. I think maybe only “J” and “z” involve motion.

cyclicalobsessive · March 23, 2023, 4:31am

Found a model exists so it is possible

Of course I forgot everything I learned about TFlite which doesn’t help.

Found a model I think:

It looks correct, there is an 11MB model.tflite file and the labels.txt file with 26 letters and the “space”, “delete”, and “nothing” class labels.

Perhaps replacing the COCO object model with this one will be all that is needed.

cyclicalobsessive · March 23, 2023, 5:19am

In case anyone is wondering where this train of thought comes from, Dave can see with four “eyes” but is deaf.

KeithW · March 24, 2023, 6:29am

Can’t wait to see where this goes.
/K

cyclicalobsessive · March 27, 2023, 2:47am

No where fast…can’t install TFlite on Ubuntu 22.04 64-bit for some reason.

Resetting to re-familiarize myself with GoPiGo OS 3.0.3 that already has TFlite installed.

That is probably the most useful to everyone else if I can get a demo working anyway. I had trouble setting up GoPiGo OS to my network last time I tried it, so I thought I would try with Dave’s current OS first.

More fun than taking a cat for a walk on a leash

KeithW · March 27, 2023, 6:26am

There’s a low bar if ever I saw one.
/K

cyclicalobsessive · March 27, 2023, 4:33pm

Blind Progress

Tested the existing tflite classify_picamera example on Dave (which has 180 rotated camera images) - works on an “electric fan” and a “monitor” but fails to recognize my scissors for some reason.

So on to try ASL!

Initial thrill - it loads the model and labels without complaint.

It recognizes “nothing” very well.

Everything else comes out as “C”, and I mean everything.

I don’t know how to view a preview with the GoPiGo OS, so debugging will be a bit difficult.

This test seems to suggest success is possible.

TensorFlowLite on GoPiGo OS

1) Connect GoPiGo OS to WiFi for convenience
2) Bring Down ASL model and label file
- su jupyter (password: jupyter)
- cd tflite/tflite_models
- sudo curl -o labels_ASL.txt https://raw.githubusercontent.com/sayannath/American-Sign-Language-Detection/master/ASL%20App/app/src/main/assets/labels.txt
- sudo curl -o ASL.tflite https://raw.githubusercontent.com/sayannath/American-Sign-Language-Detection/master/ASL%20App/app/src/main/assets/model.tflite


python3 /home/jupyter/tflite/tflite_examples/lite/examples/image_classification/raspberry_pi/classify_picamera.py \
  --model /home/jupyter/tflite/tflite_models/ASL.tflite \
  --labels /home/jupyter/tflite/tflite_models/labels_ASL.txt \
  --preview no \
  --confidence 0.7


Dave's camera is rotated 180 so needed to add:

try:
      camera.rotation=180        <----
      stream = io.BytesIO()

- Made a copy with the change called asl_picamera.py in /home/jupyter/tflite


Running asl model:

python3 /home/jupyter/tflite/asl_picamera.py   \
  --model /home/jupyter/tflite/tflite_models/ASL.tflite  \
  --labels /home/jupyter/tflite/tflite_models/labels_ASL.txt  \
  --preview no   --confidence 0.7


Running classify  model:

python3 /home/jupyter/tflite/asl_picamera.py  \
  --model /home/jupyter/tflite/tflite_models/mobilenet_v1_1.0_224_quant.tflite \
  --labels /home/jupyter/tflite/tflite_models/labels_mobilenet_quant_v1_224.txt \
  --preview no   --confidence 0.7

cyclicalobsessive · March 28, 2023, 9:27pm

Carl came to the rescue - he is also set up to run TFlite and has a desktop (Raspbian4Robots). Carl saves the images that he classifies and I can look at them on Carl’s desktop.

pi@Carl:~/Carl/Examples/TF/GoPiGo $ ./run_asl.sh 
Starting TensorFlow Lite Classification With PiCamera at 640x480
C 0.62 289.2ms
Y 0.61 316.3ms
C 0.74 306.4ms
C 0.84 325.5ms
C 0.65 303.1ms
C 0.90 293.1ms
C 0.64 341.8ms
C 0.61 312.4ms
Y 0.68 313.9ms
Y 0.71 325.2ms
Y 0.81 293.3ms
C 0.91 323.4ms
C 0.97 289.7ms

Here are the images and confidently wrong classifications:

This is pretty cool even though not correct classification, because Carl is sporting a Pi3B+ so it shows the stock GoPiGo3 running TFlite can do three classifications per second (if I can figure out why it is so confidently wrong…)

KeithW · March 29, 2023, 8:38am

Wow, that really is confidently wrong. Can you tell it to also give the second and third most likely guesses? That might be interesting. None of the guesses you showed are all that high a probability.
/K

cyclicalobsessive · March 29, 2023, 1:12pm

In the demo image, it shows “Frame, Crop, View” parameters with “Crop 224x224” and in the Mobilenet doc it mentions this dimension “Our primary network (width multiplier 1, 224 × 224),” but elsewhere in the doc it mentions the input as 320x320: "Both MobileNet models are trained and evaluated with … The input resolution of both models is 320 ×
320. "

I don’t know if they are using a segmentation step to “find a hand” and then apply a crop of 320² or 224² around “the hand”, but it seems like I need to, at a minimum, try cropping out the center 320² of the picamera’s 640x480 image and see what happens.

But the actual ASL doc states " The model takes an input image of size 224x224 with three channels per pixel (RGB - Red Green Blue)." Perhaps I will have to crop out the center 224² of the image, but without a preview on the GoPiGo3, it will be impossible to know when your hand is in the proper position.

Oh yeah, Carl does allow preview - just tested that. GoPiGo OS example may not be possible but if I can figure this out - Carl will be an ASL star!

Argh - closing the ASL program while Carl in on his dock fools the charger into thinking it should switch to trickle charge mode. Carl detects this early trickle, stating “Getting off the dock. I need a real charge”, and then quickly re-docks.

cyclicalobsessive · March 29, 2023, 3:18pm

I hate programming!

I wanted to put a 224² box on the preview.

No problem there is an example in the picamera docs, oops, that only works with Python2.7.
No problem there is a Python3.7 example in the latest picamera docs, oops, that only works with picamera 1.14 which will not be released… now there is picamera2

KeithW · March 29, 2023, 5:35pm

That would be the slick way to do it.

Good luck
/K

cyclicalobsessive · March 29, 2023, 6:26pm

Segmentation would be the best way to feed the net, but it would slow down the process a lot.

I discovered a few things about TF models and the examples:

the models contain the required input width and height embedded, available from the “Interpreter(model)” object:

      _, height, width, _ = interpreter.get_input_details()[0]['shape']

The GoPiGo3 MobileNet TFlite example captures a 640x480 image,
and performs a “resize((width, height), ANTIALIAS)” to the Mobilenet input shape.
When I checked the Mobilenet input shape, it returns 224, 224 same as the ASL model returns - that would seem good.
Thinking about the program and the resize again - there is an interesting thing happening. The image is captured in a 4:3 aspect and resized to 1:1 aspect. The resize means the entire captured image is fed to the mobilenet model, but the subject of the image will be asymmetrically resized. (I think.)

I think what I need to do is crop the 640x480 image to 480x480, then resize with antialiasing to 224x224. If I simply crop the 224x224 out of the center of the image, the user must blindly find the exact center of the camera’s field of view and guess how far away to be to fill the 224² area to the max. By cropping to 480² first, the “I see your hand completely” box is twice as big perhaps making the distance from the camera less sensitive.

Now if Carl will be a good boy and get off his dock so I can resume testing…

KeithW · March 30, 2023, 4:36am

Yeah. If each model took the same amount of time, it would cut the recognition rate in half.
I was thinking:

Run entire image to find hand
Identify center of where hand is (the models I’ve seen often give bounding boxes)
crop image of appropriate size around hand
4 feed that to ASL model

/K

cyclicalobsessive · March 30, 2023, 2:20pm

Right - expanding:

if hand bounding box has any dimension larger than the ASL model input size,
   Perform square CROP of max bounding box dimension, 
   then RESIZE with anti-aliasing to model input size
else if bounding box is less than ASL model, 
   then CROP around hand to model input size.

But first I have to prove the ASL model even works (very suspect at this point),
and I’m running into all sorts of “weird ether” issues.

First my very old bluetooth chicklet keyboard issues made me quit everything last night.

Then with a wired keyboard, I got going again this morning until Carl started flashing his “Hey, I’ve got a problem here” LED. life.log showed I2C Bus Failure Detected!

I did a restart, (forgetting that only a cold boot will clear an I2C Bus failure), and the dreaded flashing LED returned.

Then a shutdown and startup cleared the I2C issue, but caused an early trickle that I didn’t notice so Carl got off the dock early.

2023-03-30 09:34|[healthCheck.py.main]I2C Bus Failure Detected
2023-03-30 09:34|[healthCheck.py.main]Initiating 300 second watch for I2C recovery
2023-03-30 09:40|[healthCheck.py.main]I2C not recovered after 300 seconds
2023-03-30 09:44|[logMaintenance.py.main]** Running TFlite killed I2C for some reason, restarting **
2023-03-30 09:45|[logMaintenance.py.main]** Reboot Requested **
2023-03-30 09:45|[logMaintenance.py.main]** 'Current Battery 10.99v EasyGoPiGo3 Reading 10.18v' **
2023-03-30 09:46|[healthCheck.py.main]I2C not recovered after 300 seconds
2023-03-30 09:47|------------ boot ------------
2023-03-30 09:47|[juicer.py.main]---- juicer.py started at 11.00v
2023-03-30 09:49|[healthCheck.py.main]I2C Bus Failure Detected
2023-03-30 09:49|[healthCheck.py.main]Initiating 300 second watch for I2C recovery
2023-03-30 09:55|[healthCheck.py.main]I2C not recovered after 300 seconds
2023-03-30 09:57|[logMaintenance.py.main]** ouch restart doesn't clear I2C failure, need cold boot **
2023-03-30 09:57|[logMaintenance.py.main]** Routine Shutdown **
2023-03-30 09:57|[logMaintenance.py.main]** 'Current Battery 11.91v EasyGoPiGo3 Reading 11.10v' **
2023-03-30 09:59|[healthCheck.py.main]WiFi check failed (8.8.8.8)
2023-03-30 10:02|------------ boot ------------
2023-03-30 10:02|[juicer.py.main]---- juicer.py started at 10.37v
2023-03-30 10:05|[juicer.py.undock]---- Dismount 3672 at 10.3 v after 0.0 h recharge

Between my wife’s huge stream of “new computer acting funny” issues to solve,
and a particularly stealthy “not sharp pictures with some combination of camera/lens/teleconverter between two cameras, two lenses and one teleconverter”,
and my keyboard delete key going bezerk well into a too long editing session,
and Carl declaring an “I²C emergency”,
and those mystery “WiFi check failed” messages in Carl’s log for the last few years,
and my desk is a mess,
and my HOA board is corrupt,
and the country is doomed,
and nature is adapting better than I am,
and …

I’m just not feeling like I have any strength left.

cyclicalobsessive · March 30, 2023, 6:23pm

Announcing Defeat

While the research paper reported 98.8% accuracy for the TensorFlowLite ASL running on an Android phone, I cannot get above 0% accuracy for the published model on Raspberry Pi. I am out of ideas and out of my league.

This is my final post for this investigation.

KeithW · March 30, 2023, 7:20pm

Sometimes discretion is the better part of valor…
/K