How To Complicate Saying Hello To Dave

cyclicalobsessive · June 7, 2026, 4:35pm

It is so disrespectful of me to have avoided giving Dave the ability to hear what is going on around him for so long. Dave is over 8 years old and the ROS enabled Dave will be 5 years old this month, and until now I have not given him “Carl’s super power” (speech recognition).

It turns out due to my experience with Carl, that it was relatively easy to install the VOSK speech recognition engine, and get Dave listening for a few salutations. What is a bit harder is to decide what Dave’s responses should be, and to avoid repeating himself with an overly simple algorithm.

YouTube “Short” demonstrating Lyrical-Dave’s new “Hello Dave” function:

(Hearing “Good Night Dave” he responds “Good Night. I’m going to sleep now” and exits the Hello Dave program for the time being.)

KeithW · June 12, 2026, 12:19pm

That’s fantastic. Good for Dave.
/K

cyclicalobsessive · June 13, 2026, 1:56pm

How toTo Complicate Saying “Hello” To Your Robot

Interesting progression of complexity the more I think and read about human-robot greetings:

Simple grammar, simple one to one response, grammar and responses in-line Python
Many to one grammar, simple responses, still all defined in-line python
Many to one grammar, random non repetitive responses for generic greeting, single Python script incorporating phrase classification and non-repeat response generation method(s)
Make it a ROS node
Parameter file one to one phrase/simple response map, separate ROS speech recognition server and ROS parameter file based response processor, no helper methods (GitHub/voskros)
Switch to C++ and Behavior Trees
Parameter file grammar, parameter file behavior tree, ROS behavior tree server with helper methods
Above adding ROS Behavior Tree blackboard and ROS topic to blackboard methods and temporal blackboard validity methods
Above adding behaviors that call ROS actions, services, or topic publishers
Above adding “human has stopped near me” sensing and trigger/test (vision and or LIDAR)
LLM with model-context-protocol agent with “tools” to execute robot functions
Distributed Processing- too large for Raspberry Pi 5
Vision LLM to Evaluate Human In View Emotions and Intensions

cleoqc · June 16, 2026, 6:12pm

That’s awesome! What are you using to generate the audio?

cyclicalobsessive · June 17, 2026, 12:53am

TL:DR espeak-ng

I’ve been playing around to make Dave’s voice roughly “Minion Like” using speed, pitch, and speaker selection.

Humble-Dave and Kilted-Dave used Piper-TTS , but that did not allow programatically changing the volume.

For Lyrical-Dave, I have returned to using espeak-ng which allows full programmatic control of pitch, speed, and volume.

Currently Lyrical-Dave uses:

(Example:) espeak-ng -s175 -ven-us+f1 -p 99 -a 50  "This is Dave speaking"

-s175 speeds up the speaking
-ven-us+f1 chooses US English with “Female number 1”
-a 50 sets the volume to 50 for this example. 75% is loud, 99% is shouting, 10% is a very quiet whisper
-p 99 raises the pitch the maximum possible

For actual usage, I spawn the espeak-ng in a process and pass in volume, and the phrase to be spoken:

subprocess.check_output(['espeak-ng -s175 -p 99 -ven-us+f1 -a'+str(vol)+' "%s"' % phrase], stderr=subprocess.STDOUT, shell=True)

I changed the YouTube short linked above to show these specific “Lyrical-Dave” settings in the
simple “HelloDave.py” which has:

grammar based VOSK speech recognition on the front end, and
a “Non-Repeating Random Generic Greeting From A List” method for the generic greetings (Hi/Hello/Hey Dave), and
one-to-one “if recognized {GoodMorning|GoodAfternoon|GoodEvening} then repeat back {temporal greeting} to you”
and the special case of “Good Night Dave” that says “Good night, Going To Sleep Now” and exits the program.

HelloDave.py uses my “speak honoring quiet-time from 11pm to 10am” library module which also logs the datetime and what he was asked to say along with either “-spoken” or “-quiet time”

As mentioned in the prior comment, I am complicating Dave’s life and mine

via daily limited sessions with Claude to learn how to
use Behavior Trees instead of the “if-then-elseif”, and
coding a ROS 2 node in C++ instead of my comfortable Python, and
adding a Blackboard to allow posting robot information such as battery state, for non-speech-reco related behaviors to be in the same behavior-tree-rule-base as the simple “Hello Dave” mappings.
use ROS “Lifecycle Nodes” which handle
- activating all the components
  - (speech reco engine,
  - reading in and setting the grammar,
  - starting the behavior-tree server,
  - reading in the behaviors, and
  - shutting everything down in an orderly and reliable fashion.
Calls my ROS 2 “Say Node” which front-ends my “quiet-time and logging” plib/speak.py

With these settings - the new “Short”:

KeithW · June 19, 2026, 2:24pm

That’s fantastic.
/K