Old Ways And Young Ways to Wall Follow

cyclicalobsessive · October 8, 2024, 5:59pm

I have long said every self respecting robot should come knowing how to follow a wall. It is not as easy for a real robot as one might expect.

Having been writing or stealing wall-following programs for the last 35 years, I have developed a standard approach - write a program that follows the wall.

Today, while wandering the ROS Reddit, a young student asked for assistance with his “Q-Table Wall Follower”. Not having ever heard of a “Q-Table” and offering complete code for Wall Following, I was intrigued.

As I have noticed, everyone is talking about machine learning these days. Turns out Q-Tables is an implementation of “Q-Learning” which Wiki eloquently obfuscates:

Oh Right, that makes it perfectly clear

My summary: Instead of programming how to “follow the wall”, program how to “learn to follow the wall” without my old-guy wisdom.

jimrh · October 9, 2024, 4:11am

क्यू-लर्निंग एक मॉडल-मुक्त सुदृढीकरण सीखने का एल्गोरिदम है जो किसी विशेष स्थिति में किसी कार्रवाई के मूल्य को जानने के लिए है। इसके लिए पर्यावरण के मॉडल की आवश्यकता नहीं होती है (इसलिए “मॉडल-मुक्त”), और यह अनुकूलन की आवश्यकता के बिना स्टोकेस्टिक संक्रमण और पुरस्कारों के साथ समस्याओं को संभाल सकता है।[1]

किसी भी परिमित मार्कोव निर्णय प्रक्रिया के लिए, क्यू-लर्निंग वर्तमान स्थिति से शुरू होने वाले किसी भी और सभी क्रमिक चरणों पर कुल इनाम के अपेक्षित मूल्य को अधिकतम करने के अर्थ में एक इष्टतम नीति पाता है।[2] क्यू-लर्निंग किसी भी परिमित मार्कोव निर्णय प्रक्रिया के लिए एक इष्टतम कार्रवाई-चयन नीति की पहचान कर सकता है, जिसे अनंत अन्वेषण समय और आंशिक रूप से यादृच्छिक नीति दी गई है।[2] “क्यू” उस फ़ंक्शन को संदर्भित करता है जिसे एल्गोरिदम गणना करता है - किसी दिए गए राज्य में की गई कार्रवाई के लिए अपेक्षित पुरस्कार।[3]

Maybe that, (Hindi), makes it clearer?

I didn’t understand the techno-babble myself.

And the “reward” is not being thrown against a wall in frustration!

cyclicalobsessive · October 9, 2024, 4:28am

I didn’t understand exactly how the guy’s Q-learning worked, but I figured out it has a critical weak spot at start up if more than 1m from any wall, and a very nasty weakness that after it learns, and tests what it learned, it restarts dumb again silently throwing away everything it learned. As for style, he had zero logging and zero statistics gathering.

Youngins …

jimrh · October 9, 2024, 4:41am

My analysis is that, assuming it knows what “following a wall” actually IS, it conducts something like a “least-cost” analysis, determining the “least costly” method of achieving it. (They call it “greatest reward”.)

However I see a number of un-answered assumptions, such as “what does ‘following a wall’ mean?” and “how do you ‘follow’ a wall?”

Without that basic knowledge, how can the robot do an undefined behavior?

jimrh · October 9, 2024, 1:07pm

After further reading, certain things become obvious:

“Model free” is a bit of a misnomer because the actor, (a robot in this case), has to know something about its environment and what is expected of it.
- A lane-following robot needs to know enough about its environment to know what a “lane” is, and that the goal is to remain centered within it.
- It needs to know enough about itself to know how to react to excursions.
  - Turn right or left?
  - Avoid an obstacle ahead?
  - Change its velocity?
- It needs a method to measure the delta-value of its behavior with respect to its goal; so that it knows if correction is needed, in what direction, and by how much.
Apparently “model free” means that the robot doesn’t need a complete, or even adequately defined, model of its surroundings or the various conditions and states it might find itself in.
- It sounds somewhat like a non-deterministic state machine, though that doesn’t make sense either as the robot must have knowledge of how to react to changes in its condition as the situation changes.

This appears to reduce itself to a least-cost problem in finite statistical analysis, as the biggest action driver appears to be a mathematical analysis of the statistically best way to progress from the robot’s current state, (what it’s doing and/or where it is now), to a “next state” which is an approximation of the best choices for its near and long-term situation with respect to its goal.

====================

Using your current situation and goal, (following a wall), this reduces itself to certain simple requirements and states.

Knowing the difference between a “wall” (a relatively long and flat object in its field-of-view), and an “obstacle” (a point-source object).
- SLAM mapping of its immediate surroundings? It doesn’t need a map of the entire area, just enough of a map to know what’s in its immediate neighborhood.
Knowing how close is “close enough” and the excursion limits. (X > me > wall)
The ability to measure compliance with these limits - perhaps by using the LIDAR? - and take appropriate corrective action.
The ability to stop and ask for help if none of the above are true.

This reduces itself to a relatively simple state machine that steps through a series of tests to determine its current situation and determine the appropriate next action .

No finite statistical analysis needed.

jimrh · October 9, 2024, 1:26pm

And the problem with this is that within the bounds of any Markov decision, “finite” means relatively few unknowns, because as the possible scope of actions increases, the number of possible Q-states rapidly expands to the point of being ungovernable.

In other words, a Markov analysis is (currently) limited to relatively simple, tightly bounded, (mathematically speaking), problems.

IMHO, this is better done with the results of a SLAM mapping analysis. . . .

Can you take the results of a partial SLAM map and say “this is a wall” or “this is an obstacle”, and use that analysis to find (the closest and/or a particular) wall?

. . . .as a SLAM map is effective over the entire resolving range of the device doing the mapping instead of being limited to within a meter or so.

Additionally, if you can make the dock distinctive in some way, a similar logic analysis can be used to help the robot find its way home again.

cyclicalobsessive · October 9, 2024, 1:29pm

“relatively simple” is certainly relative. This guy only has very finite states and very finite actions to learn. I would have hoped a wall following machine learner would not just learn:

when to turn
what direction to turn
when to drive

but also:

how fast or how much to turn

since the word “optimal” was thrown into the definition of Q-Learning. In fact, I have the feeling his system will require many more iterations than one that learns how much to turn based on the distance from the wall, rather than three finite actions and three hard edged zones.

Perhaps the guy’s program is simplified but the concept is more comprehensive.

My summary of what his program does:

FILE: wallfollow_ml.py

DOC: Wall Following Using Q-Learning

Ref: wall_follower/wall_follower.py at main · BennettSpitz51/wall_follower · GitHub

Goals:

robot drives with a wall on its left
keep the wall to the left of the robot within a certain distance
don’t crash into walls

During training mode:

If the robot is within a certain distance (left_ok), give the action that got it there a reward.
If the robot is too close, penalize the action that got it there slightly
If it has crashed (front_blocked), penalize big time.

Q-Learning involves discovering state transitions with probability of goal achievement

This wallfollower has states:

front_blocked
left_ok
left_too_close
left_too_far (includes no left wall at all)

The wallfollower has available actions:

move_forward (drive forward with no turning) linear.x=0.3m/s
turn_left (turn left while moving forward) linear.x=0.05m/s, angular.z=0.3 rad/s (CCW)
turn_right (turn right while moving forward) linear.x=0.05m/s, angular.z= -0.3 rad/s (CW)
wall_turn (only spin, no forward motion) linear.x=0, angular.z= -0.2 rad/s (CW)

You know my fascination with wall following and safe wandering programs for every robot. My ROS 2 Wander program turned out to have an “indecisive black hole” when the bot wandered down a narrow triangular shaped canyon. It would be interesting to build some sort of machine learning wanderer, and see how fragile its learning is. I am guessing it will take more time to test/improve the learning than to test/improve my hard-coded wanderer, and always, my hard-coded wanderer’s behavior will be more “explainable”.

My biggest complaint about neural-nets and all non-rule-based artificial intelligent systems is no explanation after the fact “why did you do that?” I’ve seen some AI systems that are actually training an explainer on themselves to give some sort of explanation but I’m guessing it will be something like “node 49 saw increased activation” which will still be meaningless.