Babel Stix - Simple Implementation of the Babel Fish Idea


Here is a little homebrew project to simulate the idea of the 'Babel Fish' as described in the Hitchhikers Guide to the Galaxy.
What we're after is something that 2 people speaking different languages can put in their ears to allow them to understand each other. It's a basic example of automatic language translation. We're keeping it low-budget, simple, open and free here.
Don't expect miracles - the setup essentially works and will do very basic spoken language translation, but even then it does suffer a bit from the 'hovercraft is full of eels' effect, which is par for the course in speech recognition and translation right now in any case.

The Approach

We give a bluetooth headset to each of the 2 people to wear. A pocket/wearable computer will record what's being said on each headset, do some rudimentary speech recognition on it, translate it and pipe the spoken translation back out to the other headset. So we've got several interesting technologies in use in the project:
  • Bluetooth - recording and playing back speech via bluetooth headsets
  • Speech recognition - working out what each person is saying
  • Translation - going from speech in one language to another
  • Speech synthesis - generating an appropriate phrase in the new language

    Ingredients: The Hardware

    The setup we will use is fairly low cost and readily available from places like Amazon and Ebay for not a lot of money. We're using:
  • A Gumstix Verdex XM4.

    This is a super-compact ARM-based computer, running Linux and there is a good free open-source development and support environment for it.
    Ideally we'd be using one of the latest Bluetooth-enabled Linux smart-phones to do the job here, but they seem to cost a fortune currently (even more so for the development kits) and community support for them seems to be generally lacking at the moment.
    The Gumstix is a good development platform, and since it uses a similar type of low-power processor as current smartphones (ARM) and a comparable small amount of memory and storage (64/16 megs), it makes a nice stepping-stone to getting the code from here working on a Linux smartphone later.
    Note: Some gumstix boards have bluetooth built in. I'd recommend avoiding these for now since 1) there isn't currently audio support for it in the Linux kernel and 2) the antenna for it seems fragile. Also stick to the newer 'Verdex' models since only these have the proper USB-host support for a USB bluetooth dongle.
    Since the verdex is so small, it needs an expansion board to plug into to bring out power and USB connectors. For development I'm using the console-vx board to provice power,USB and serial-port connections. I'm going to move to the much smaller and compact breakout-vx board soon though.
    (The gumstix has no connectors for screen or keyboard - you set it up, configure it etc from another computer over a serial cable initially.)


    The gumstix initially runs off mains electricity via a power brick that comes with it, but it is possible to rig up a small battery pack to make it portable, with a bit of soldering. It runs on 3.5 volts, <0.5A, so 4x NiMH rechargable AA batteries will run it for about 5 hours. It's better to just run it from a wall socket for starters though.
    You could use a laptop/desktop here as the computer part, but this way I don't have to muck about getting Linux/Bluetooth running on random hardware and this approach keeps the project portable/pocketable. For the convenience and price of the gumstix and the other components, it makes sense to stick to a standard platform to set things up on really. The code available here is aimed at running on a gumstix computer, however it should run fine on any desktop/laptop linux system with working bluetooth. All source code is available, and it's all so short and simple it shouldn't present too much trouble.

  • 2 Bluetooth headsets to record/playback the speech we're translating. I've tried a couple of these and found that the Motorola H500 headsets seem to work OK. (If you know better let me know! Here are More supported headsets) They're compact, can be charged via mini-USB (saves having lots of extra power bricks around - I'm short of space as it is) and work fairly well with the open-source software we'll be using. To be honest if you're just playing around with the project yourself you can use just one headset instead of 2 but 2 are needed for a proper setup.

    warning - I have a sneaking suspicion that newer H500 headsets have different firmware that does'nt work with with the gumstix. I recently bought a brand-new H500 headset as a spare. It is black, with a black center instead of silver as with my other 2. When I try and use carwhisperer with it, I get a "SCO Connecttion refused" error. Might be best to stick with older versions of this headset, or use a differnt model alltogether.
    An alternative approach would be to use a single microphone and speakers here, but in speech recognition land, headsets are de-rigueur these days it seems for getting usable recorded speech, and we're trying to copy the Babel Fish after all.

    Our 2 Motorola H500 headsets, fishified for the project

  • A Bluetooth dongle - This lets the guumstix talk Bluetooth to the headsets. Again these are fairly inexpensive. I've tried a couple of them, and had trouble getting the no-name ones from Ebay working. The Sitecom ones work pretty well, (although seem to add an annoying buzz to recording, nothing we cant filter out though) and again are available on Ebay/Amazon cheaply. I've tried models CN-520 and CN-512 and both of these seem OK.

    Sitecom Bluetooth dongle

  • A USB thumb drive. This is just for convenience to help copy over code to the gumstix. Just about any size/model is fine. If you have a spare one knocking around, all the better.


    Ingredients: The Software

    The package is aimed at running on a gumstix computer, but should run fine on a standard Linux PC with bluetooth working on it (i.e. the carwhisperer program working properly on it beforehand). Both arm9 and i386 binaries are provided in the zip file. I'm doing development on it on a desktop PC, so having things work on both platforms is handy. You can even skip the bluetooth bit and use a wired microphone on a desktop PC instead to just play with the recognition side of the project, details in the package readme.

    The gumstix is well supported, and there's plenty of free open source software available to use on it. For the development side, again there are free tools to make the use of here.
  • buildroot - This is a large suite of development software installed on a desktop Linux PC to compile applications for transfer to the gumstix.
  • Recording and playing back from Bluetooth headsets - we use a small and simple program called 'carwhisperer' for this. This was originally written as a hacker tool to demonstrate security weaknesses in Bluetooth headsets, but it works fine as a basic recording/playback tool too for our purposes.
  • Speech processing - we use a modified version of the free open-source package 'Speex' ,which has been specifically designed to handle speech files. We only use this to clean up our bluetooth recordings here though. The star of the show is LPC10 ,an aggressive speech compression tool. (and over 16 years old!)

    A run through of how it works

    here is a run through of what's going on when the program is working. I wont go into too much detail or get too technical on it. A top-level algorithm would be:
    1) Open a connection to headset 1, play a welcome message and record a spoken
    phrase
    2) do some initial cleaning up of the recording to remove noise, hiss etc.
    3) break up the recording into several individual words.
    3) for each word:
      3a) compress the recording so that is still understandable, but so we have
       less data to look at.
      3b) compare each (compressed data) frame with known frame sounds from a
    codebook to get a list of the 'phoneme' sounds that make up the word
      3c) find the best match word based on a 'likelyhood' rulebook for our
    particular language
      3d) Add this word in the translated language to the ouput phrase 
    4) switch to using the other headset, play back the whole translated phrase and record the reply.
    5) loop back to 2
    

    It would be nice to be able to be recording/translating/playing back on both headsets at the same time (i.e. 'full duplex'), but because of current limitations in the current bluetooth drivers, we only allow recording/playback on one headset at a time ('half duplex'), so the 2 people speaking have to take turns to speak. Full duplex might be do-able on a more powerful setup though. (and once the bluetooth stack supports eSCO properly)
    I'll go into detail on some of these items:

    2 Clean up recording - You will notice there is a rather nasty buzz present on recordings made with this setup. Suspect it's an issue with the bluetooth dongle I'm using, but I don't know if it's a headset problem, a Bluetooth adapter problem, a driver problem or what. It's a pest though, if anybody is getting better quality recordings than the examples on this site, please let me know how. It's possible to clean up the recording prior to processing it, which is a good idea anyway whether there is a buzzing present or not.
    The way to do this is to use the '--denoise' flag in speexenc - the Speex compression tool - to do some Speex/FFT based super filtering on our recording. It gives really good results. An alternative/faster approcch for is instead to use quick-and-dirty time-domain modeling and filtering of the buzz with a short C program (filter.c) instead. Doesn't sound quite as good but runs fast as hell.

    3a,3b,3c - the speech recognition bit gets its own page

    3d Swap recognized word for translated word - This is the simplest way to do speech translation, on a word-for-word basis. It works on a basic level, but leads to crappy translation for longer phrases, for example the classic 'The vodka is good but the meat is rotten' translation of 'the spirit is strong but the flesh is weak'. phrase-level and sentence-level translation over and above simple word swapping would help here and wouldn't be too difficult to implement. The speech synthesis side is just stringing the required word samples together and playing the result back.

    The setup in practice

    To keep things simple, I'm only doing translation of numbers between English and French for now. Having a small pool of words to work with makes the recognition and translation a lot easier and more reliable. It should be easy to add in other European languages later, and increase the pool of words later on.

    Installing the Hardware/Software

    Detailed instructions on doing the install on a gumstix are on This page
    (On a Linux PC with working bluetooth, just need to download and unzip the package into your home directry and run it from there.)

    Current/Future Work on the project

    Currently I'm working on improving the recognition engine (have only spend a couple of hours knocking up the rough versions of the codebook and rulebook files used just now). Will also look at using pocketsphinx as a recognition engine instead, it seems to be coming on in leaps and bounds although seems to have a pretty steep learning curve.

    Feedback

    If you have any feedback or comments on the project, feel free to email me: brian@shapeseeker.com. This address is also my PayPal tip jar :o)
    (I'm also in the job market soon... )