Voice control (using Raspberry pi)
by Mirko
Done with 1 Thymio
Keywords:
Voice Raspberry
Construction projects
None
Proof-of concept combining a Thymio II with a Raspberry pi, running Jasper (open-source voice-control platform). Speech-to-text and text-to-speech are all performed on the Raspberry (ie. not using some web API such as those from Google). As you can see in the video, the voice recognition still needs some fine-tuning to understand my accent…
Needs a USB microphone (but a webcam microphone can do the trick) and optionally a loudspeaker (to hear the robot's feedback). Figuring out the audio input can be tricky — especially if using the default SST tool, Pocketsphinx, your microphone has to support a sample rate of 16000 (my very cheap webcam only supported *higher* frequencies, and downsampling worsened the audio quality, so I got another webcam, still one of the cheapest).
More details:
- Thymio II
- Raspberry pi 2 Model B, running plain Raspbian Jessie (release date 2015-09-24), optionally with a powerbank, so the entire thing will be autonomous
- Aseba (on the Raspberry, works right away :-)
- Jasper (https://jasperproject.github.io/), the installation of which was not straightforward because the install instructions don't reflect the current versions of the Raspberry pi and some of the required or optional tools. I followed the manual installation instructions (Method 3), and ran first into a discrepancy with the ALSA configuration (there is no alsa-base.conf as part of the current Raspbian, but it looks like ALSA will work anyway without any edits). Then, installing Pocketsphinx (speech-to-text), the apt-get approach didn't seem to work with Jasper, so I had to compile from source as described. The next hurdle was OpenFST, which did not install as a dependency of Phonetisaurus, and is not anymore available at the location given. You can get the correct version directly from http://www.openfst.org/twiki/bin/view/FST/FstDownload, and then have to compile. Another file that is not anymore available at the documented location was the Phonetisaurus model. I got it from https://www.dropbox.com/s/kfht75czdwucni1/g014b2b.tgz?dl=0.
Caveat: I did not manage to get to work the STT options Google or Wit.ai (and didn't like the pricing of AT&T). I also failed to find or get to work some of the TTS options.
Jasper lets you write your own python "modules", and it's pretty easy to start from an existing module to then add the Thymio interaction, as per some of the other Thymio projects on this site (see example attached). Asebamedulla needs to be running in the background, of course (asebamedulla "ser:device=/dev/ttyACM0 &").
By the way, I'm confident this setup will also work for other languages, or at least French and German (that's what I'll be trying next, because my English accent evidently sucks ;-).
Comment this creation!
Really nice one!
But I miss the red light going left and right like in K2000 ;)
Excellent idea, I'll add that right away :-)
Hi, I have followed your idea and I tried to replicate what you show in your video. But everytime I launch my python program without being on the desktop, I get this error:
Unable to autolaunch a dbus-daemon without a $DISPLAY for X11
How did you get around that problem ?
Thanks for your response :)