Create an "autonomous" architecture for Charlie/Charline?

Forking the discussion from ROSbot GoPi5Go-Dave Architecture because I was hijacking the thread:

@cyclicalobsessive, Here’s my idea:

  1. Four main modes of action:
  • Do something. (“Chase the Cat” or whatever)
  • Avoid. (Am I too close to something?)
  • Escape. (Did I just hit something?)
  • Emergency stop.  (Sudden, unexplained, massive failure.)
  1. Proposed implementation:
  • Five software objects: (in priority order)

    • Emergency Stop
      The Emergency Stop is an “Oh, :face_with_symbols_over_mouth:!!” exception handling routine that, ideally, will never be needed.  If activated, it assumes total control, stops the robot, and kills the entire process stack by throwing a non-maskable exception, stopping everything.

    • The arbiter
      The arbiter is responsible for deciding which of the other three software objects has control, permitting control, withholding permission and/or revoking permission to control.  This would essentially implement a “cooperative multi-tasking system” - sort-of.

    • Escape
      Escape is responsible for monitoring the bumper sensor.  If the bumper sensor is activated, it requests permission to run from the arbiter and once granted performs a movement to escape from the obstacle - such as a back-up and rotate maneuver.

    • Avoid
      Avoid is responsible for monitoring the distance sensor.  If the distance sensor registers the presence of an obstacle within “X” cm, it requests permission to run from the arbiter and once granted performs a movement to avoid the obstacle - such as a change of direction until the obstacle is no longer detected.

    • “Do something”
      “Do something” is the “main” functionality routine for the robot.  It is responsible for making the robot do what I want it to do at that point in time.

  • Inter-process communication model:

    • Each process, (except for the arbiter), has access to an individual set of status and control flags:

      • Alive:
        This flag serves as a “heartbeat” flag.  Every arbiter cycle this flag is cleared for each process and is set when the process runs it’s next polling cycle.  The arbiter checks to see if the flag is set before clearing it and the process checks to see that the arbiter cleared it.

      • Request_to_run:
        This flag is set by the process to indicate to the arbiter that it wants to run.

      • Allowed_to_run:
        This flag is set by the arbiter granting permission to the process to run.  The process is responsible for checking the status of this flag every polling cycle and, if cleared, the process must stop as rapidly as possible.

      • Running:
        This is set by the process to indicate that it has control of the robot.  This flag is cleared when either it has finished what it was doing, or immediately after stopping when the allowed_to_run is cleared.

      • Mutex:
        This “flag” - is implemented as a mutex to arbitrate access between the arbiter and a particular processes flags to guarantee atomic access.

    • Basic process flow:

      • Arbiter starts, establishes all the flags and establishes the mutex.

      • Arbiter starts the three processes and verifies via the “alive” flag that the processes are running.

      • The main (Do Something) process immediately raises it’s request_to_run flag.

      • If none of the other two processes want to run, the main process is allowed to run and the main process raises its “running” flag.

      • If one of the other two processes raises its request_to_run flag, the arbiter clears the main processes allowed_to_run flag, waits for the “running” flag to clear, and then allows the requesting process to run.

      • If both processes raise their request_to_run flag, the Escape process is allowed control before the Avoid process.

      • Once neither the Escape nor the Avoid process want to run, control is returned to the main program.

  • Heartbeat process integrity control

    • Process failure is monitored by the three “Alive” flags.
      • Every time the arbiter runs, and after it processes a particular processes state, it clears that processes “Alive” flag.

      • The arbiter will eventually clear all three “Alive” flags after it completes one polling cycle.  If a particular processes alive flag is not reset active after “X” polling cycles, the arbiter records a “X” process died event, signals all processes to stop, and transfers control to the emergency routine.  (If the arbiter has control over the other three processes state, it kills the processes before it continues.)

      • Each process, as it begins its processing loop, first checks to see if the “Alive” flag has been cleared.  If it has, it resets it to active and continues with it’s normal processing.  If the Alive flag hasn’t been cleared after “X” number of polling loops, it stops everything it’s doing, records an “Arbiter Died” event, and immediately transfers control to the Emergency routine.

  1. Process flow control
  • The main process, (“Do Something”), gets control by default and begins running by requesting permission to run, being granted permission, setting its “Running” flag, and begins running the code.

  • Each of the other exception processes, (escape and avoid), checks to see if the thing(s) it’s responsible for need attention.  If it does, the routine raises its associated request_to_run flag and waits for its specific allowed_to_run flag to be set.&nbsp. Once that flag it set, it clears the request_to_run flag and sets the running flag.  It then begins doing its specific task.

    • If, while waiting for permission to run, the event that triggered the particular request to run is resolved, the routine clears the request_to_run flag.  (i.e.  The bumper records an impact event and  the distance sensor records an “object too close” event at the same time.)
       
      Since the escape, (bumper) routine is allowed to run first, it will back away and change direction.  Because the Escape routine changed the robot’s position, the distance sensor may not be too close to something anymore - therefor the “Avoid” process also clears its request_to_run flag.
       
  • Each process continues to check the allowed_to_run flag. If set, it continues.  If cleared, it immediately halts its activity and clears the running flag.

  • Each process sets the alive flag every polling loop.
    This interaction between the arbiter, (clearing the flag), and each process, (setting the flag), provides a method for each process to notify the arbiter if it dies and notify itself if the arbiter dies.

  1. Process flow summary:
  • Assume the main program is running.

    • Its “Running” flag is set
    • Its “Allowed_to_run” flag is set
    • Its “Request_to_run” flag is cleared.
    • Its “Alive” flag should be toggling at a speed governed by the speed of the routines controlling it.
       
  • Assume that an “event” occurs.  (either a bumper impact or a distance sensor event.)

    • The process responsible for that event raises its “Request_to_run” flag.

    • On the next arbiter run cycle, it recognizes the raised flag as an event that takes priority over the main program.  If it were not a higher priority process, it would be forced to wait until the higher priority task(s) had ended.

      • It clears the main program’s “Allowed_to_run” flag, revoking permission to run and waits for the “Running” flag to clear.

      • It sets the requesting processes “Allowed_to_run” flag.

    • The requesting process then releases the “Request_to_run” flag and sets its “Running” flag.

    • The arbiter sees the “Running” flag for that process asserted and continues polling.

    • The running process continues to run until:

      • The process is complete, where it releases its “Running” flag, which allows the arbiter to restart any pending, lower priority tasks.

        • In the case of the main task, it continues to run until it’s either interrupted again or ends.
           
      • It is interrupted by a higher priority process.

      • An “Emergency” event occurs which kills everything.

I’m sure there’s more to think about but this is my first pass at fleshing out an “architecture”.

2 Likes

Looks like a good approach.

I can always think of problems and improvements for any concept, but I don’t see any flaw in yours, so go for it.

2 Likes

My mental model envisions several different state machines, running simultaneously.

  1. How do I create them such that they run independently?

  2. How do I control them?  (i.e. start and/or stop them running.)

  3. What is the best way to communicate with them?  Signals?  A common memory area/global variables?

    • In JavaScript, I can create something like a “structure” where a particular variable has “attributes” that can be assigned values.
      i.e.:
this:
    Var1 = value1;
    Var2 = value2;
    [. . .]
    VarN = valueN

Then later

[Something](this.Var1) 

Where the value of this.Var1 = value1

avoid_flag:
    Alive = False;
    Request_to_run = False;
    Allowed_to_run = False;
    [. . .]

Is this available in Python?

  1. If I use variables, how do I make them universally available to all the processes - won’t they have their own unique process environment that’s isolated from the rest of the machine?

  2. What’s the best way to create a named variable dedicated to a process that might not even exist yet?

2 Likes

Arrgh - It is easy to do a basic multi-threaded program, or a multi-processing program, but…
multi-threading vs multi-processing (with inter-thread/process communication), (with exception handling) gets nasty looking very quickly and the choice affects what is available.

Hints:

  • global vars are not recommended, but are available in threads.
  • If you use globals, use the threading.Lock()
  • state machines can be explicit and basic, or you can use python-statemachine package

basic explicit state machine with global variables

basic threading with globals and using threading lock example

basic mobile robot python-statemachine example

complex threading example with exception handling

2 Likes

And I’m getting a splitting headache just thinking about it.

I’m going to have to follow your links and investigate this.

==============================

I’ve been running this in the back of my mind while I’m doing other things, and. . . . .

. . . the number of things that I have to keep track of seems to be expanding out of control - to the point that I feel like I’m designing a miniature operating system!  And that doesn’t even consider exception handling inside the individual routines themselves - and the effect of an exception HERE on what’s happening way over THERE.

I’m thinking that I’m going to have to make some radical simplifications in my model:

  1. A single process with a single thread.
  2. A simple sequential polling arrangement that checks everything and transfers control to whatever needs doing, falling through to the main process.  (i.e.  No signalling.)
  3. “Exceptions”, (events like the bumper or the distance sensor), continue to run until resolved and cannot be interrupted.
  4. All exceptions fall through to a universal handler that stops the robot, then posts the exception.  (I suspect that a thrown, un-handled exception will stop the robot anyway since I believe that whatever Python process is running gets killed.  But then again, maybe not?)

Even that is going to have more knots, kinks, and pesky fleas than a small furry animal.

1 Like

Excellent approach. Some thoughts:

  • For items 1 and 4: main() should have an outer try/except with a robot.stop() as the finally:
#!/usr/bin/env python3

from easygopigo3 import EasyGoPiGo3
import traceback
import logging
import time

# MAIN function
def main():

    # set up logger
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    # declare the robot
    robot = EasyGoPiGo3(use_mutex=True)
    # Wrap main in try/except/finally
    try:
       #  do all the main stuff
       logging.info("Main(): starting now
   except KeyboardInterrupt:
      logging.info("Main(): Detected ctrl-C, cleaning up")
    except Exception as e:
      logging.info("Main(): handling main exception: %s",e)
      traceback.print_exc()  
    finally:
          logging.info("Main(): executing finally")
          robot.stop()

if __name__ == '__main__':
    main()

For Item 3: These are not “exceptions” they are “events”. Like your original thinking - use a flag for an event being true, and a flag for an event handler being active, or instead of a flag for the handler being active use the “state” variable and when the event flag is true, force the state variable to the event handler state:



...

def initial():
    global state, prior_state
    # init stuff
    prior_state=cruise
    state=cruise

def cruise():
    # cruise stuff

def avoid(): 
    global state
    # avoid stuff
    prior_state=avoid
    state=cruise

def escape():
    global state, prior_state
    # escape stuff
    state=prior_state

def analyse_sensors():
    ...
    if bumpers:
        prior_state=state
        state = escape
    elif distance < TOO_CLOSE:
        prior_state=state
        state = avoid
    else:
        prior_state=cruise
        state=cruise

def do_state(state):
    state()

def main():
...
    normal_operation = True
    state = initial
    prior_state=initial
    while normal_operation:
        read_sensors()
        analyse_sensors()
        do_state(state)
2 Likes

Great minds think alike, and not-so-great minds, (like mine), emulate the great minds they see.  (What was that about “standing on the shoulders of giants”, 'eh?)

I was thinking of something similar, but you think in greater detail than I do.

Right now, I’m busy destroying installations by blowing up GoPiGo3 installs and rebuilding OS’s
:man_facepalming:

1 Like

I am planning to make “events” atomic.

Once an event receives control, polling stops until the event is resolved.  Therefore there is no “state” monitoring.  The event handler either has control or it doesn’t.  If it doesn’t, the main routine polls for events.  If the event has control, everything else stops for the duration of the event.

This eliminates the need for monitoring the state and passing flags/messages back and forth.

Thanks for the help and ideas!

2 Likes