HOW TO GET IBM'S VIAVOICE DICTATION SOFTWARE TO GO ON SUSE LINUX 7.X ==================================================================== By Volker Kuhlmann <v.kuhlmann@elec.canterbury.ac.nz> Created 9, 10 September 2001 Updated 11 Sep 2001: links section Updated 18 Oct 2001: IBM Java version Updated 05 Nov 2001: SuSE 7.3 Updated 22 Nov 2001: clarified some points Updated 04 Jan 2002: how to enroll correctly; reducing waits Contents 1. Introduction 2. Software components 2.1 Java 3. Installation 3.1 SuSE 7.3 4. Bug fixing before wasting time on anything else 4.1 /usr/bin/vvsetenv 4.2 /bin/sh 4.3 /usr/bin/vvstartdictation 4.4 /usr/bin/vvstartuserguru 4.5 /etc/viavoiceps.conf 4.6 $HOME/viavoice/ 4.7 Creating a ViaVoice user 4.8 Enrollment (Creating a Voice Model) 4.9 Uninstalling ViaVoice 5. Proceed as per instructions 5.1 Sound drivers 5.2 Audio mixer 5.3 Adjustment of audio setup 6.0 Start dictation 7.0 To Do A. Links B. Quick Start 1. Introduction ---------------- Apart from some projects which don't look too promising, IBM's ViaVoice Dictation is the only voice recogniton software available for linux. It consists of two parts: the recognition engine, and a user interface. The user interface is programmed in java, and handles user registration + training, and allows dictation into a text field. Compared with software available for Microsoft windows, the dictation interface is not very advanced, though (some of the?) source code is available for anyone wishing to improve on it. The recognition engine is available for free download from IBM together with a time-limited evaluation license and a software development kit. There are projects which make use of the recognition engine, most notably xvoice (http://download.sourceforge.net/xvoice/). This split into engine and user interface is very smart and leaves room for some interesting possibilities of making use of the engine. The dictation software and engine runtime can be purchased from IBM for US$60. It includes some kind of headset and is only available for US English. Unfortunately, IBM only sells it to people in the USA. Whatever the reason for this daft decision is, we'll have to live with it for the foreseeable future. (It is perhaps unwise to complain to the speech recognition group about this via their mailing lists, these people are very helpful and have probably not made that decision.) Finding a dealer in the USA who is willing to ship a box to New Zealand for a lesser fee than FedEx is asking is not something I'd like to get into if I can avoid it. An alternative (the only alternative?) is to go via Mandrake. Mandrake has licensed viavoice for linux for US + UK English, German and French, and there seem to be some hooks for Spanish. Obviously this is not included in the GPL version of Mandrake, only in the powerpack. The powerpack works out at about the same price as IBM's dictation pack, albeit without headset. I would have preferred to pay my support directly to IBM, but sorry - see above. PLEASE NOTE that the instructions in here are SPECIFICALLY FOR THE VIAVOICE FROM MANDRAKE. Mandrake 8.0 is the one I have. Because I have no copy of IBM's boxed set available, I am unable to tell to what extent this applies to the boxed set. It is probably that some of it will, but feedback (thanks Jim Mohr) has it that not all of it is directly applicable. Thanks also to David Morgan who said the instructions in here are also useful for making ViaVoice work on Mandrake. There are some minor differences: (section 4.1) SPCH_JAVA needs to be set to /usr/lib/java-1.3/jre, and the line with LD_LIBRARY_PATH is somewhere else - remove it anyway. 2. Software components ----------------------- a) From IBM / Mandrake: The viavoice software is from CD 4 (Commercial Apps 2) of a Mandrake linux powerpack 8.0, ISBN 1-57595-493-1. File dates (in GMT), sizes, md5 sums etc: 5675307 2001-04-18 10:56:28 ViaVoice_Dictation-1.1-0.0.i386.rpm 1287476 2001-04-18 10:56:29 ViaVoice_TTS_rtk-5.1-1.2.i386.rpm 2846764 2001-04-18 10:56:29 ViaVoice_runtime-3.1-0.0.i386.rpm 185041363 2001-04-18 10:58:34 ViaVoice_runtime_US_LangPack-3.1-0.0.i386.rpm f01b9a13da086e63c006891686c2f619 ViaVoice_Dictation-1.1-0.0.i386.rpm 0aafee2bb6fb82b89abcf8f2c1baefc8 ViaVoice_TTS_rtk-5.1-1.2.i386.rpm 70c5a72d1c17794b6953e8de5924e293 ViaVoice_runtime-3.1-0.0.i386.rpm 8355ddf8db19ae0e4ff5620649257691 ViaVoice_runtime_US_LangPack-3.1-0.0.i386.rpm These rpm packages are compiled and packaged by IBM (unless the package information is forged): > cd /media/cdrom/Mandrake/RPMS4 > rpm -qp --queryformat "%-28{name}, %-3{packager}, %-1{buildhost}\n" Via* ViaVoice_Dictation , IBM, linuxbuild1.bocaraton.ibm.com ViaVoice_TTS_rtk , IBM, linuxbuild5.bocaraton.ibm.com ViaVoice_runtime , IBM, linuxbuild1.bocaraton.ibm.com ViaVoice_runtime_US_LangPack, IBM, linuxbuild1.bocaraton.ibm.com According to Damon Lynch, the IBM dictation pack contains: ViaVoice_Dictation-1.0-1.0 ViaVoice_TTS_rtk-5.1-1.2 ViaVoice_runtime-3.0-1.2 ViaVoice_sdk-3.0-1.1 So the software shipped with Mandrake is actually slightly newer. Apart from having some problems fixed, it also contains new bugs... b) From SuSE: The install is on a stock SuSE 7.2 system, running the supplied 2.4.4-4GB kernel. The CDs contain the Application ID in the iso9660 filesystem: SuSE-Linux-Professional-INT-i386-7.2.0#0 Also tested on a stock SuSE 7.3 system with the supplied 2.4.10-4GB kernel. Application ID of the CDs: SuSE-Linux-Professional-INT-i386-7.3.0#0 2.1 Java IBM clearly states that dictation requires java 1.2.2rc4 from www.blackdown.org. This is certainly true for IBM's boxed set of dictation. The rpms shipped with Mandrake contain a dependency on IBM's java2 1.3, as shipped with Mandrake and SuSE. As the rpms are packaged by IBM, one can assume that ViaVoice dictation 3.1 will now also run with IBM's java. The version of IBM's java shipped with SuSE 7.2 is just fine: 15983791 2001-05-16 10:37:17 IBMJava2-JRE-1.3-45.i386.rpm In July 2001 SuSE released an updated java version (http://www.suse.de/de/support/download/updates/72_i386.html, IBMJava2-JRE-1.3-67.i386.rpm) but I have not tried this myself (upgrading to SuSE 7.3 will show). It is unlikely that there are problems. The version of IBM's java shipped with SuSE 7.3 is fine as well: 17263096 2001-09-23 20:51:13 IBMJava2-JRE-1.3-109.i386.rpm 3. Installation ---------------- This is very easy: install package IBMJava2-JRE from SuSE CD 5, and the 4 viavoice rpms from Mandrake CD 4. Use yast, yast2, or just type: rpm -Uvh /media/cdrom/full-names/i386/IBMJava2-JRE-1.3-45.i386.rpm rpm -Uvh /media/cdrom/Mandrake/RPMS4/ViaVoice* This requires about 280MB of disk space, and doesn't include any files created while running viavoice. Some minor things need to be fixed, run (as root): chmod a+r /etc/viavoiceps.conf /usr/lib/menu/vv* Ignore the several errors you see, the rpm install scripts are buggy. (Who had this idea of calling gless? Pity the license doesn't show if gless doesn't exist, like on a KDE or non-Gnome system...) IBM's boxed set probably doesn't contain the 2 files /usr/lib/menu/vv*, they're Mandrake-specific and useless on a SuSE system. Delete them if you like. Now, if you think you're done now, better think again and read section 4! 3.1 SuSE 7.3 Proceed as described in here for SuSE 7.2. Before starting viavoice, change variable SPCH_JAVA (towards the end) in /usr/bin/vvsetenv from export SPCH_JAVA=/usr/lib/jre1.3/jre to: export SPCH_JAVA=/usr/lib/jdk1.3/jre 4. Bug fixing before wasting time on anything else --------------------------------------------------- Before the effort is of any use, we need to do some serious tidying up. All these modifications are contained in a patch file which can be downloaded from http://volker.orcon.net.nz/, see exact URL at end. Apply the patch with (as root): umask 22 patch -b -p0 <viavoice-3.1-patch.diff rm /usr/bin/vvuser.orig /usr/bin/viavoice.orig chmod 755 /usr/bin/vvuser /usr/bin/viavoice 4.1 /usr/bin/vvsetenv Add a line (as root): export SPCH_JAVA=/usr/lib/jre1.3/jre This, once we have made further modifications, will tell the various parts of dictation where to find the java we want/need to use. IBM should really make use of such a variable! Especially vvstartdictation will just die otherwise... While you're at it, delete the line starting with LD_LIBRARY_PATH (it does nothing), and move the line #!/bin/bash to be the very first in the file (otherwise it's useless, and yes, as vvsetenv is only sourced it's still useless at the top). Add these lines (as root): export SPCH_WAITEXIT=1 export SPCH_WAITJAVA=0 This, together with modifications in other shell scripts, reduces the time for which viavoice sits there waiting and doing absolutely nothing. It now starts and exits much faster. (These modifications are included in the patch, but not listed any further in this HOWTO). 4.2 /bin/sh Various startup scripts do not specify which shell they need to run under, so the system will take the user's default. If that happens to be e.g. tcsh, viavoice dictation will only ever spit some garbage onto the screen and terminate after a second or two. To fix this, add as very first line (as root): #!/bin/sh to these scripts: /usr/bin/vvstartaudiosetup /usr/bin/vvstartenrollment /usr/bin/vvstartuserguru /usr/lib/ViaVoice/bin/vvsetuser 4.3 /usr/bin/vvstartdictation This script contains a major bug: it runs the java from /usr/lib/ViaVoice/IBMJava2-13, unfortunately, Mandrake installs the java into /opt/IBMJava2. This means that the reason why ViaVoice starts at all on Mandrake is due to pure chance! SuSE installs the java somewhere else again, so we deal with this properly and make use of the variable SPCH_JAVA introduced into vvsetenv for this reason. Change the line (towards the end, and as root) export PATH=/usr/lib/ViaVoice/IBMJava2-13/jre/bin:$PATH to export PATH=$SPCH_JAVA/bin:$PATH 4.4 /usr/bin/vvstartuserguru This is programmed in java as well, and likewise needs to be told properly where to find the java to use. Change the line towards the end (as root) export PATH=/opt/IBMJava2-13/jre/bin:$PATH to export PATH=$SPCH_JAVA/bin:$PATH 4.5 /etc/viavoiceps.conf This is the absolut bummer of a bug. If this file is missing, running vvstartuserguru (which is the first thing to do after installation) will terminate with the completely bogus error "The Speech System is in use by another application". And that before it even attempts to open the sound device! Of course there isn't another program using the sound system either (this can be tested with lsof +D /dev *as root*). Furthermore, this file is not contained in any of the viavoice rpms, nor is it created directly by the rpms' installation scripts. Somehow by a minor miracle, this file turns up out of nowhere when installing the rpms, although I'm not sure whether that is always the case. Check if you have it. It's easiest to download the file (see Links section), this file is *not* included in the patch. Otherwise, run as root (the indentation are tabs - fix it if necessary, or download the file): cat >/etc/viavoiceps.conf <<EOF ############################################# # Generated options file - do not edit ############################################# ViaVoice Runtimes/RTConfig bin = /usr/lib langs = /usr/lib/ViaVoice/vocabs/langs/%s maps = /usr/lib/ViaVoice/vocabs/langs/%s/map macros = /usr/lib/ViaVoice/vocabs/langs/%s/macros vocabs = /usr/lib/ViaVoice/vocabs sharedbin = /usr/lib ViaVoice Runtimes/RTConfig/Runtimes/Reco version = 3.1.0.0 ViaVoice Runtimes/RTConfig/Runtimes/Reco/Languages En_US = 1 ViaVoice Runtimes/RTConfig/Runtimes/Control version = 3.1.0.0 ViaVoice Runtimes/RTConfig/Runtimes/Control/Languages En_US = 1 / VoiceType\Audio\Devices VoiceType\Audio\Devices\StdMicJack AudioDevInfoVersion = 1 EnrollSamplingRate = 22 DictationDeviceType = Microphone EnrollDeviceType = Microphone AudioSetupDeviceType = Microphone VoiceType\Audio\Devices\StdMicJack\En_US DescriptionText = Microphone EOF chmod 644 /etc/viavoiceps.conf As Damon Lynch tells me, this problem does not occur with the dictation rpms version 3.0 which are in IBM's boxed set. 4.6 $HOME/viavoice/ This directory is created by dictation sometime during the initial configuration, when running vvstartuserguru. It has the extremely annoying effect of causing dictation's java program to segfault early on, or to die with some other error. The fix is easy (as a normal user): rm -rf $HOME/viavoice At this early stage there aren't any useful files in there, but keep in mind that this directory contains your viavoice users, voice models, and voice training results later on. (IBM: There is more debugging info, see Links section.) 4.7 Creating a ViaVoice user As per /usr/doc/ViaVoice/en_us.rt.readme.txt, running vvstartuserguru should finally adjust the sound system and create a ViaVoice user. Unfortunately, it segfaults when $HOME/viavoice exists, it also segfaults when $HOME/viavoice does not exist. Catch 22. No comment as to programming quality. Luckily, there is another way. Make sure /etc/viavoiceps.conf exists, as per above. Then run vvuser. It's not in the path, we'll take care of that and at the same time make it a stand-alone program. Run this (as root): cat >/usr/bin/vvuser <<EOF #!/bin/sh # Run the program of ViaVoice which deals with creating/deleteing ViaVoice # users and voice models. Make this a stand-alone prgram. # Volker Kuhlmann # 9 Sep 2001 source vvsetenv exec /usr/lib/ViaVoice/bin/vvuser "$@" EOF chmod 755 /usr/bin/vvuser This file is created when applying the patch, but you still need to chmod it. Now create a ViaVoice user (as a normal linux user): vvuser -adduser "Any name you like to be called by ViaVoice" -setdefault This has also created a voice model for the user. You should create a new voice model (and repeat the enrollment) for each time you have a change in microphone or shift location. vvuser -addvoicemodel "My other desk" -setdefault As there is no way to add more users or voice models using the graphical user interface, this is a handy program to know of. 4.8 Enrollment (Creating a Voice Model) IBM says to use vvstartuserguru for this, however it crashes at the very first time (java segfault). vvstartuserguru does not allow to create additional users nor to create additional voice models for existing user(s), see 4.7 above. Use vvuser. After running vvuser, you must still run vvstartuserguru. Although the audio setup and the story reading can be performed with vvstartaudioguru and vvstartenrollment, a short speech sample is not processed without running vvstartuserguru. NOTE (read this twice!): RUNNING VVSTARTENROLLMENT AFTER VVUSER WILL NOT PROCESS THE SHORT VOICE SAMPLE BEFORE YOU READ THE STORY; AS A RESULT OF SPENDING THE TIME TO READ THE STORY, THE RECOGNITION ACCURACY WILL DROP TO ONE RANDOM WORD PER SPOKEN SENTENCE, I.E. ZERO. There is no warning about this, you will just be wasting your time and then waste even more time trying to figure out why it doesn't work. Been there, done that. I would gladly spend quite a bit more money on a product which was engineered soundly! 4.9 Uninstalling ViaVoice This is only listed here to mention another bug. Removing the ViaVoice rpms fails because the script(s) these rpms contain are buggy. To clean up afterwards, run this command as root: rpm -e ... (the 4 ViaVoice rpms) rpm -e --noscripts ViaVoice_runtime_US_LangPack rm -rf /usr/lib/ViaVoiceTTS /etc/viavoiceps.conf Remove this line from /etc/profile (and/or the shell you are using): export ECIINI=/usr/lib/ViaVoiceTTS/eci.ini (Note: The files created by applying the patch are still not removed.) (Note 2: The install scripts of these rpms are buggy too. See section 3.) 5. Proceed as per instructions ------------------------------- Finally, we're at a stage where we can do what the instructions say to start with. 5.1 Sound drivers ViaVoice requires sound hardware and a linux driver which can record at 22kHz mono, and play back (see ViaVoice docs for details) I have only one sound card to test with, and that's a Soundblaster PCI 64 with an Ensoniq 1370 chip. It's a pretty cheap card, but it works (recognition accuracy may perhaps be better with a higher-quality card). I am using the alsa 0.5.10 drivers which are installed by default by SuSE. This does not support /dev/sndstat for this card/chip, but it's not needed by ViaVoice. I am unsure whether /dev/audio support is needed by ViaVoice, I don't think so, although the example in /usr/doc/ViaVoice/en_us.rt.readme.txt to test the audio setup makes use of it. Test your audio setup with any other program, I used sound studio and and (neither of which come with SuSE 7.2). 5.2 Audio mixer It is important to find some mixer settings (output volume, input gain, etc) which will work with ViaVoice, and to have these restored each time ViaVoice is started. To do this automatically when starting dictation, create a wrapper script by running these commands (as root): cat >/usr/bin/viavoice <<EOF #!/bin/sh # # /usr/bin/viavoice # Restore alsa mixer settings for ViaVoice and start dictation. # Mixer settings are restored from the first of these files found: # $HOME/viavoice/alsa.conf # /etc/asound.conf.viavoice # # Volker Kuhlmann # 9, 10 Sep; 10 Dec 2001; 3 Jan 2002 # mixerloaded= if alsactl 2>/dev/null; then for f in \ $HOME/viavoice/alsa.conf \ /etc/asound.conf.viavoice \ ; do if [ -r "$f" ]; then echo "Restoring mixer settings from $f" alsactl -f "$f" restore "$@" mixerloaded=1 break fi done else echo "Can't execute alsactl - unable to restore mixer settings." fi test -z "$mixerloaded" && echo \ "Can't find any mixer settings to restore, ensure mixer settings are correct or ViaVoice might not run too well." echo "" exec /usr/bin/vvstartdictation "$@" EOF chmod 755 /usr/bin/viavoice This script looks for these files and restores the mixer settings from the first one it finds: /etc/asound.conf.viavoice $HOME/viavoice/alsa.conf It then starts dictation. A personal mixer settings file can be created with alsactl -f $HOME/viavoice/alsa.conf store or a system-wide one as root with alsactl -f /etc/asound.conf.viavoice store chmod 644 /etc/asound.conf.viavoice after adjusting the mixer settings with a program of your choice. See next section. 5.3 Adjustment of audio setup Run vvstartuserguru and follow its instructions. Adjust mixer settings if necessary, and restart the adjustment as shown by the program. Immediately after this program termninates (and best before shutting down your mixer), save the mixer settings for ViaVoice as per the previous section. Of course you can use any other method which you can think of to do the same thing. 6.0 Start dictation ------------------- Run viavoice, (or vvstartdictation if you are not using the wrapper script) and test it out. 7.0 To Do ---------------- Transfer another language from MS-windows to linux, and use that instead. Tests of recognition accuracy and system resource use. A. Links --------- Here are some related links: http://volker.orcon.net.nz/linux/viavoice.html This HOWTO http://volker.orcon.net.nz/linux/viavoice-3.1-patch.diff The patch to apply after installing the rpms. Contains all the modifications of sections 4 and 5 (excluding /etc/viavoiceps.conf) and the scripts which are newly created. This is the latest patch; previous ones can be found in the same directory. http://volker.orcon.net.nz/linux/vvfiles/viavoice http://volker.orcon.net.nz/linux/vvfiles/vvuser Up-to-date versions of the files which need to be created. Copy to /usr/bin. These are included in the patch above. Set permissions to 755. http://volker.orcon.net.nz/linux/vvfiles/viavoiceps.conf If you don't have it. Copy to /etc and set permissions to 644. http://volker.orcon.net.nz/linux/vvfiles/vv-debug.tar.gz More debugging information for IBM's programmers. http://www.ibm.com/software/speech/linux/dictation.html IBM ViaVoice for Linux http://www-4.ibm.com/software/speech/dev/ IBM ViaVoice Developer's Corner http://www-4.ibm.com/software/speech/dev/sdk_linux.html http://www-3.ibm.com/software/speech/dev/faq_linux.html IBM ViaVoice software development kit + FAQ http://www6.software.ibm.com/dl/viavoice/runtime-p IBM download area http://xvoice.sourceforge.net/ The xvoice project http://www.out-loud.com/ By a long time user of speech recognition, Susan Fulton. Contains many articles and tips. http://www.out-loud.com/linux.html Review of VV for linux (Aug 2000) http://www.spracherkennung.de/ ViaVoice.de | Alles über Spracherkennung http://www.speechcontrol.com/microphones/Headsetindex.asp In the USA commercially available headsets and their ratings by the seller, Marty Markoe. http://www.speechcontrol.com/articles/ Many good articles. http://www.talktoyourcomputer.com/whywedont.htm Why the home-level equipment is not adequate for serious use, compared with professional equipment. (They do seem to compare a cheap with an expensive headset though.) http://www.bright.net/~dlphilp/linuxsound/ A huge number of linux sound links. B. Quick Start --------------- ## As root: # SuSE 7.2 CD 5: rpm -Uvh /media/cdrom/full-names/i386/IBMJava2-JRE-1.3-45.i386.rpm # Mandrake powerpack 8.0 CD 4: rpm -Uvh /media/cdrom/Mandrake/RPMS4/ViaVoice* chmod a+r /etc/viavoiceps.conf /usr/lib/menu/vv* # Download file(s) umask 22 patch -b -p0 <viavoice-3.1-patch.diff rm /usr/bin/vvuser.orig /usr/bin/viavoice.orig chmod 755 /usr/bin/vvuser /usr/bin/viavoice # Shell setup # If you are not using bash, copy the line "export ECIINI=..." from /etc/local # to whereever your shell needs it, and change the syntax to be correct for # your shell. ## As normal linux user: # ViaVioice user setup vvuser -adduser "Any name you like to be called by ViaVoice" -setdefault vvstartuserguru # Audio mixer settings # See sections 5.2 and 5.3 above # Run ViaVoice dictation viavoice