ViaVoice Dictation for SuSE Linux 7.2/7.3

     HOW TO GET IBM'S VIAVOICE DICTATION SOFTWARE TO GO ON SUSE LINUX 7.X
     ====================================================================

By Volker Kuhlmann <v.kuhlmann@elec.canterbury.ac.nz>

Created 9, 10 September 2001
Updated 11 Sep 2001: links section
Updated 18 Oct 2001: IBM Java version
Updated 05 Nov 2001: SuSE 7.3
Updated 22 Nov 2001: clarified some points
Updated 04 Jan 2002: how to enroll correctly; reducing waits

Contents
	1.  Introduction
	2.  Software components
	2.1 Java
	3.  Installation
	3.1 SuSE 7.3
	4.  Bug fixing before wasting time on anything else
	4.1 /usr/bin/vvsetenv
	4.2 /bin/sh
	4.3 /usr/bin/vvstartdictation
	4.4 /usr/bin/vvstartuserguru
	4.5 /etc/viavoiceps.conf
	4.6 $HOME/viavoice/
	4.7 Creating a ViaVoice user
	4.8 Enrollment (Creating a Voice Model)
	4.9 Uninstalling ViaVoice
	5.  Proceed as per instructions
	5.1 Sound drivers
	5.2 Audio mixer
	5.3 Adjustment of audio setup
	6.0 Start dictation
	7.0 To Do
	A.  Links
	B.  Quick Start


1.  Introduction
----------------

Apart from some projects which don't look too promising, IBM's ViaVoice
Dictation is the only voice recogniton software available for linux. It
consists of two parts: the recognition engine, and a user interface. The user
interface is programmed in java, and handles user registration + training, and
allows dictation into a text field. Compared with software available for
Microsoft windows, the dictation interface is not very advanced, though (some
of the?) source code is available for anyone wishing to improve on it.

The recognition engine is available for free download from IBM together with a
time-limited evaluation license and a software development kit. There are
projects which make use of the recognition engine, most notably xvoice
(http://download.sourceforge.net/xvoice/).

This split into engine and user interface is very smart and leaves room for
some interesting possibilities of making use of the engine.

The dictation software and engine runtime can be purchased from IBM for US$60.
It includes some kind of headset and is only available for US English.
Unfortunately, IBM only sells it to people in the USA. Whatever the reason for
this daft decision is, we'll have to live with it for the foreseeable future.
(It is perhaps unwise to complain to the speech recognition group about this
via their mailing lists, these people are very helpful and have probably not
made that decision.)

Finding a dealer in the USA who is willing to ship a box to New Zealand for a
lesser fee than FedEx is asking is not something I'd like to get into if I can
avoid it.

An alternative (the only alternative?) is to go via Mandrake. Mandrake has
licensed viavoice for linux for US + UK English, German and French, and there
seem to be some hooks for Spanish. Obviously this is not included in the GPL
version of Mandrake, only in the powerpack.

The powerpack works out at about the same price as IBM's dictation pack, albeit
without headset. I would have preferred to pay my support directly to IBM, but
sorry - see above.

PLEASE NOTE that the instructions in here are SPECIFICALLY FOR THE VIAVOICE
FROM MANDRAKE. Mandrake 8.0 is the one I have. Because I have no copy of IBM's
boxed set available, I am unable to tell to what extent this applies to the
boxed set. It is probably that some of it will, but feedback (thanks Jim Mohr)
has it that not all of it is directly applicable.

Thanks also to David Morgan who said the instructions in here are also useful
for making ViaVoice work on Mandrake. There are some minor differences:
(section 4.1) SPCH_JAVA needs to be set to /usr/lib/java-1.3/jre, and the line
with LD_LIBRARY_PATH is somewhere else - remove it anyway.


2.  Software components
-----------------------

a) From IBM / Mandrake:

The viavoice software is from CD 4 (Commercial Apps 2) of a Mandrake linux
powerpack 8.0, ISBN 1-57595-493-1. File dates (in GMT), sizes, md5 sums etc:

    5675307 2001-04-18 10:56:28 ViaVoice_Dictation-1.1-0.0.i386.rpm
    1287476 2001-04-18 10:56:29 ViaVoice_TTS_rtk-5.1-1.2.i386.rpm
    2846764 2001-04-18 10:56:29 ViaVoice_runtime-3.1-0.0.i386.rpm
  185041363 2001-04-18 10:58:34 ViaVoice_runtime_US_LangPack-3.1-0.0.i386.rpm

f01b9a13da086e63c006891686c2f619  ViaVoice_Dictation-1.1-0.0.i386.rpm
0aafee2bb6fb82b89abcf8f2c1baefc8  ViaVoice_TTS_rtk-5.1-1.2.i386.rpm
70c5a72d1c17794b6953e8de5924e293  ViaVoice_runtime-3.1-0.0.i386.rpm
8355ddf8db19ae0e4ff5620649257691  ViaVoice_runtime_US_LangPack-3.1-0.0.i386.rpm

These rpm packages are compiled and packaged by IBM (unless the package
information is forged):

> cd /media/cdrom/Mandrake/RPMS4
> rpm -qp --queryformat "%-28{name}, %-3{packager}, %-1{buildhost}\n" Via*
ViaVoice_Dictation          , IBM, linuxbuild1.bocaraton.ibm.com
ViaVoice_TTS_rtk            , IBM, linuxbuild5.bocaraton.ibm.com
ViaVoice_runtime            , IBM, linuxbuild1.bocaraton.ibm.com
ViaVoice_runtime_US_LangPack, IBM, linuxbuild1.bocaraton.ibm.com

According to Damon Lynch, the IBM dictation pack contains:

	ViaVoice_Dictation-1.0-1.0
	ViaVoice_TTS_rtk-5.1-1.2
	ViaVoice_runtime-3.0-1.2
	ViaVoice_sdk-3.0-1.1

So the software shipped with Mandrake is actually slightly newer. Apart from
having some problems fixed, it also contains new bugs...

b) From SuSE:

The install is on a stock SuSE 7.2 system, running the supplied 2.4.4-4GB
kernel. The CDs contain the Application ID in the iso9660 filesystem:

	SuSE-Linux-Professional-INT-i386-7.2.0#0

Also tested on a stock SuSE 7.3 system with the supplied 2.4.10-4GB kernel.
Application ID of the CDs:

	SuSE-Linux-Professional-INT-i386-7.3.0#0


2.1 Java

IBM clearly states that dictation requires java 1.2.2rc4 from
www.blackdown.org. This is certainly true for IBM's boxed set of dictation.

The rpms shipped with Mandrake contain a dependency on IBM's java2 1.3, as
shipped with Mandrake and SuSE. As the rpms are packaged by IBM, one can assume
that ViaVoice dictation 3.1 will now also run with IBM's java.

The version of IBM's java shipped with SuSE 7.2 is just fine:

   15983791 2001-05-16 10:37:17 IBMJava2-JRE-1.3-45.i386.rpm

In July 2001 SuSE released an updated java version
(http://www.suse.de/de/support/download/updates/72_i386.html,
IBMJava2-JRE-1.3-67.i386.rpm) but I have not tried this myself (upgrading to
SuSE 7.3 will show). It is unlikely that there are problems.

The version of IBM's java shipped with SuSE 7.3 is fine as well:

   17263096 2001-09-23 20:51:13 IBMJava2-JRE-1.3-109.i386.rpm


3.  Installation
----------------

This is very easy: install package IBMJava2-JRE from SuSE CD 5, and the 4
viavoice rpms from Mandrake CD 4. Use yast, yast2, or just type:

	rpm -Uvh /media/cdrom/full-names/i386/IBMJava2-JRE-1.3-45.i386.rpm
	
	rpm -Uvh /media/cdrom/Mandrake/RPMS4/ViaVoice*

This requires about 280MB of disk space, and doesn't include any files created
while running viavoice.

Some minor things need to be fixed, run (as root):

chmod a+r /etc/viavoiceps.conf /usr/lib/menu/vv*

Ignore the several errors you see, the rpm install scripts are buggy. (Who had
this idea of calling gless? Pity the license doesn't show if gless doesn't
exist, like on a KDE or non-Gnome system...)

IBM's boxed set probably doesn't contain the 2 files /usr/lib/menu/vv*, they're
Mandrake-specific and useless on a SuSE system. Delete them if you like.

Now, if you think you're done now, better think again and read section 4!


3.1 SuSE 7.3

Proceed as described in here for SuSE 7.2. Before starting viavoice, change
variable SPCH_JAVA (towards the end) in /usr/bin/vvsetenv from
	export SPCH_JAVA=/usr/lib/jre1.3/jre
to:
	export SPCH_JAVA=/usr/lib/jdk1.3/jre


4.  Bug fixing before wasting time on anything else
---------------------------------------------------

Before the effort is of any use, we need to do some serious tidying up.

All these modifications are contained in a patch file which can be downloaded
from http://volker.orcon.net.nz/, see exact URL at end.

Apply the patch with (as root):

umask 22
patch -b -p0 <viavoice-3.1-patch.diff
rm /usr/bin/vvuser.orig /usr/bin/viavoice.orig
chmod 755 /usr/bin/vvuser /usr/bin/viavoice


4.1 /usr/bin/vvsetenv

Add a line (as root):

	export SPCH_JAVA=/usr/lib/jre1.3/jre

This, once we have made further modifications, will tell the various parts of
dictation where to find the java we want/need to use. IBM should really make
use of such a variable! Especially vvstartdictation will just die otherwise...

While you're at it, delete the line starting with LD_LIBRARY_PATH (it does
nothing), and move the line #!/bin/bash to be the very first in the file
(otherwise it's useless, and yes, as vvsetenv is only sourced it's still
useless at the top).

Add these lines (as root):

	export SPCH_WAITEXIT=1
	export SPCH_WAITJAVA=0

This, together with modifications in other shell scripts, reduces the time for
which viavoice sits there waiting and doing absolutely nothing. It now starts
and exits much faster. (These modifications are included in the patch, but not
listed any further in this HOWTO).


4.2 /bin/sh

Various startup scripts do not specify which shell they need to run under, so
the system will take the user's default. If that happens to be e.g. tcsh,
viavoice dictation will only ever spit some garbage onto the screen and
terminate after a second or two.

To fix this, add as very first line (as root):

#!/bin/sh

to these scripts:

/usr/bin/vvstartaudiosetup
/usr/bin/vvstartenrollment
/usr/bin/vvstartuserguru
/usr/lib/ViaVoice/bin/vvsetuser


4.3 /usr/bin/vvstartdictation

This script contains a major bug: it runs the java from
/usr/lib/ViaVoice/IBMJava2-13, unfortunately, Mandrake installs the java into
/opt/IBMJava2. This means that the reason why ViaVoice starts at all on
Mandrake is due to pure chance!

SuSE installs the java somewhere else again, so we deal with this properly and
make use of the variable SPCH_JAVA introduced into vvsetenv for this reason.
Change the line (towards the end, and as root)

export PATH=/usr/lib/ViaVoice/IBMJava2-13/jre/bin:$PATH

to

export PATH=$SPCH_JAVA/bin:$PATH


4.4 /usr/bin/vvstartuserguru

This is programmed in java as well, and likewise needs to be told properly
where to find the java to use. Change the line towards the end (as root)

export PATH=/opt/IBMJava2-13/jre/bin:$PATH

to

export PATH=$SPCH_JAVA/bin:$PATH


4.5 /etc/viavoiceps.conf

This is the absolut bummer of a bug. If this file is missing, running
vvstartuserguru (which is the first thing to do after installation) will
terminate with the completely bogus error "The Speech System is in use by
another application". And that before it even attempts to open the sound
device! Of course there isn't another program using the sound system either
(this can be tested with

	lsof +D /dev

*as root*).

Furthermore, this file is not contained in any of the viavoice rpms, nor is it
created directly by the rpms' installation scripts. Somehow by a minor miracle,
this file turns up out of nowhere when installing the rpms, although I'm not
sure whether that is always the case.

Check if you have it. It's easiest to download the file (see Links section),
this file is *not* included in the patch. Otherwise, run as root (the
indentation are tabs - fix it if necessary, or download the file):

cat >/etc/viavoiceps.conf <<EOF
#############################################
# Generated options file - do not edit
#############################################
ViaVoice Runtimes/RTConfig
	bin = /usr/lib
	langs = /usr/lib/ViaVoice/vocabs/langs/%s
	maps = /usr/lib/ViaVoice/vocabs/langs/%s/map
	macros = /usr/lib/ViaVoice/vocabs/langs/%s/macros
	vocabs = /usr/lib/ViaVoice/vocabs
	sharedbin = /usr/lib
ViaVoice Runtimes/RTConfig/Runtimes/Reco
	version = 3.1.0.0
ViaVoice Runtimes/RTConfig/Runtimes/Reco/Languages
	En_US = 1
ViaVoice Runtimes/RTConfig/Runtimes/Control
	version = 3.1.0.0
ViaVoice Runtimes/RTConfig/Runtimes/Control/Languages
	En_US = 1
/
VoiceType\Audio\Devices
VoiceType\Audio\Devices\StdMicJack
	AudioDevInfoVersion = 1
	EnrollSamplingRate = 22
	DictationDeviceType = Microphone
	EnrollDeviceType = Microphone
	AudioSetupDeviceType = Microphone
VoiceType\Audio\Devices\StdMicJack\En_US
	DescriptionText = Microphone
EOF
chmod 644 /etc/viavoiceps.conf


As Damon Lynch tells me, this problem does not occur with the dictation rpms
version 3.0 which are in IBM's boxed set.


4.6 $HOME/viavoice/

This directory is created by dictation sometime during the initial
configuration, when running vvstartuserguru. It has the extremely annoying
effect of causing dictation's java program to segfault early on, or to die with
some other error.

The fix is easy (as a normal user):

	rm -rf $HOME/viavoice

At this early stage there aren't any useful files in there, but keep in mind
that this directory contains your viavoice users, voice models, and voice
training results later on.

(IBM: There is more debugging info, see Links section.)


4.7 Creating a ViaVoice user

As per /usr/doc/ViaVoice/en_us.rt.readme.txt, running vvstartuserguru should
finally adjust the sound system and create a ViaVoice user. Unfortunately, it
segfaults when $HOME/viavoice exists, it also segfaults when $HOME/viavoice does
not exist. Catch 22. No comment as to programming quality.

Luckily, there is another way. Make sure /etc/viavoiceps.conf exists, as per
above. Then run vvuser. It's not in the path, we'll take care of that and at the
same time make it a stand-alone program. Run this (as root):

cat >/usr/bin/vvuser <<EOF
#!/bin/sh
# Run the program of ViaVoice which deals with creating/deleteing ViaVoice
# users and voice models. Make this a stand-alone prgram.
# Volker Kuhlmann
#   9 Sep 2001
source vvsetenv
exec /usr/lib/ViaVoice/bin/vvuser "$@"
EOF
chmod 755 /usr/bin/vvuser

This file is created when applying the patch, but you still need to chmod it.

Now create a ViaVoice user (as a normal linux user):

vvuser -adduser "Any name you like to be called by ViaVoice" -setdefault

This has also created a voice model for the user. You should create a new voice
model (and repeat the enrollment) for each time you have a change in microphone
or shift location.

	vvuser -addvoicemodel "My other desk" -setdefault

As there is no way to add more users or voice models using the graphical user
interface, this is a handy program to know of.


4.8 Enrollment (Creating a Voice Model)

IBM says to use vvstartuserguru for this, however it crashes at the very first
time (java segfault). vvstartuserguru does not allow to create additional users
nor to create additional voice models for existing user(s), see 4.7 above. Use
vvuser.

After running vvuser, you must still run vvstartuserguru. Although the audio
setup and the story reading can be performed with vvstartaudioguru and
vvstartenrollment, a short speech sample is not processed without running
vvstartuserguru.

NOTE (read this twice!):

  RUNNING VVSTARTENROLLMENT AFTER VVUSER WILL NOT PROCESS THE SHORT VOICE
  SAMPLE BEFORE YOU READ THE STORY; AS A RESULT OF SPENDING THE TIME TO READ
  THE STORY, THE RECOGNITION ACCURACY WILL DROP TO ONE RANDOM WORD PER SPOKEN
  SENTENCE, I.E. ZERO.

There is no warning about this, you will just be wasting your time and then
waste even more time trying to figure out why it doesn't work. Been there, done
that. I would gladly spend quite a bit more money on a product which was
engineered soundly!


4.9 Uninstalling ViaVoice

This is only listed here to mention another bug. Removing the ViaVoice rpms
fails because the script(s) these rpms contain are buggy. To clean up
afterwards, run this command as root:

rpm -e ... (the 4 ViaVoice rpms)
rpm -e --noscripts ViaVoice_runtime_US_LangPack
rm -rf /usr/lib/ViaVoiceTTS /etc/viavoiceps.conf

Remove this line from /etc/profile (and/or the shell you are using):
export ECIINI=/usr/lib/ViaVoiceTTS/eci.ini

(Note: The files created by applying the patch are still not removed.)
(Note 2: The install scripts of these rpms are buggy too. See section 3.)


5.  Proceed as per instructions
-------------------------------

Finally, we're at a stage where we can do what the instructions say to start
with.


5.1 Sound drivers

ViaVoice requires sound hardware and a linux driver which can record at 22kHz
mono, and play back (see ViaVoice docs for details)

I have only one sound card to test with, and that's a Soundblaster PCI 64 with
an Ensoniq 1370 chip. It's a pretty cheap card, but it works (recognition
accuracy may perhaps be better with a higher-quality card).

I am using the alsa 0.5.10 drivers which are installed by default by SuSE. This
does not support /dev/sndstat for this card/chip, but it's not needed by
ViaVoice. I am unsure whether /dev/audio support is needed by ViaVoice, I don't
think so, although the example in /usr/doc/ViaVoice/en_us.rt.readme.txt to test
the audio setup makes use of it.

Test your audio setup with any other program, I used sound studio and and
(neither of which come with SuSE 7.2).


5.2 Audio mixer

It is important to find some mixer settings (output volume, input gain, etc)
which will work with ViaVoice, and to have these restored each time ViaVoice is
started. To do this automatically when starting dictation, create a wrapper
script by running these commands (as root):

cat >/usr/bin/viavoice <<EOF
#!/bin/sh
#
# /usr/bin/viavoice
# Restore alsa mixer settings for ViaVoice and start dictation.
# Mixer settings are restored from the first of these files found:
#   $HOME/viavoice/alsa.conf
#   /etc/asound.conf.viavoice
#
# Volker Kuhlmann
#   9, 10 Sep; 10 Dec 2001; 3 Jan 2002
#

mixerloaded=
if alsactl 2>/dev/null; then
    for f in \
	$HOME/viavoice/alsa.conf \
    	/etc/asound.conf.viavoice \
	; do
    	
	if [ -r "$f" ]; then
	    echo "Restoring mixer settings from $f"
	    alsactl -f "$f" restore "$@"
	    mixerloaded=1
	    break
	fi
    done
else
    echo "Can't execute alsactl - unable to restore mixer settings."
fi
test -z "$mixerloaded" && echo \
"Can't find any mixer settings to restore, ensure mixer settings are correct
or ViaVoice might not run too well."
echo ""
exec /usr/bin/vvstartdictation "$@"
EOF
chmod 755 /usr/bin/viavoice


This script looks for these files and restores the mixer settings from the first
one it finds:
	/etc/asound.conf.viavoice
	$HOME/viavoice/alsa.conf
It then starts dictation.

A personal mixer settings file can be created with

  alsactl -f $HOME/viavoice/alsa.conf store

or a system-wide one as root with

  alsactl -f /etc/asound.conf.viavoice store
  chmod 644 /etc/asound.conf.viavoice

after adjusting the mixer settings with a program of your choice. See next
section.


5.3 Adjustment of audio setup

Run vvstartuserguru and follow its instructions. Adjust mixer settings if
necessary, and restart the adjustment as shown by the program.

Immediately after this program termninates (and best before shutting down your
mixer), save the mixer settings for ViaVoice as per the previous section.

Of course you can use any other method which you can think of to do the same
thing.


6.0 Start dictation
-------------------

Run viavoice, (or vvstartdictation if you are not using the wrapper script) and
test it out.


7.0 To Do
----------------

Transfer another language from MS-windows to linux, and use that instead.

Tests of recognition accuracy and system resource use.


A.  Links
---------

Here are some related links:

http://volker.orcon.net.nz/linux/viavoice.html
	This HOWTO

http://volker.orcon.net.nz/linux/viavoice-3.1-patch.diff
	The patch to apply after installing the rpms. Contains all the
	modifications of sections 4 and 5 (excluding /etc/viavoiceps.conf) and
	the scripts which are newly created.
	
	This is the latest patch; previous ones can be found in the same
	directory.

http://volker.orcon.net.nz/linux/vvfiles/viavoice
http://volker.orcon.net.nz/linux/vvfiles/vvuser
	Up-to-date versions of the files which need to be created. Copy to
	/usr/bin. These are included in the patch above.
	Set permissions to 755.

http://volker.orcon.net.nz/linux/vvfiles/viavoiceps.conf
	If you don't have it. Copy to /etc and set permissions to 644.

http://volker.orcon.net.nz/linux/vvfiles/vv-debug.tar.gz
	More debugging information for IBM's programmers.

http://www.ibm.com/software/speech/linux/dictation.html
	IBM ViaVoice for Linux

http://www-4.ibm.com/software/speech/dev/
	IBM ViaVoice Developer's Corner

http://www-4.ibm.com/software/speech/dev/sdk_linux.html
http://www-3.ibm.com/software/speech/dev/faq_linux.html
	IBM ViaVoice software development kit + FAQ

http://www6.software.ibm.com/dl/viavoice/runtime-p
	IBM download area

http://xvoice.sourceforge.net/
	The xvoice project

http://www.out-loud.com/
	By a long time user of speech recognition, Susan Fulton. Contains many
	articles and tips.
http://www.out-loud.com/linux.html
	Review of VV for linux (Aug 2000)

http://www.spracherkennung.de/
	ViaVoice.de | Alles über Spracherkennung

http://www.speechcontrol.com/microphones/Headsetindex.asp
	In the USA commercially available headsets and their ratings by the
	seller, Marty Markoe.
http://www.speechcontrol.com/articles/
	Many good articles.

http://www.talktoyourcomputer.com/whywedont.htm
	Why the home-level equipment is not adequate for serious use, compared
	with professional equipment. (They do seem to compare a cheap with an
	expensive headset though.)

http://www.bright.net/~dlphilp/linuxsound/
	A huge number of linux sound links.


B.  Quick Start
---------------

## As root:

# SuSE 7.2 CD 5:
rpm -Uvh /media/cdrom/full-names/i386/IBMJava2-JRE-1.3-45.i386.rpm

# Mandrake powerpack 8.0 CD 4:
rpm -Uvh /media/cdrom/Mandrake/RPMS4/ViaVoice*

chmod a+r /etc/viavoiceps.conf /usr/lib/menu/vv*

# Download file(s)
umask 22
patch -b -p0 <viavoice-3.1-patch.diff
rm /usr/bin/vvuser.orig /usr/bin/viavoice.orig
chmod 755 /usr/bin/vvuser /usr/bin/viavoice

# Shell setup
# If you are not using bash, copy the line "export ECIINI=..." from /etc/local
# to whereever your shell needs it, and change the syntax to be correct for
# your shell.

## As normal linux user:

# ViaVioice user setup
vvuser -adduser "Any name you like to be called by ViaVoice" -setdefault
vvstartuserguru

# Audio mixer settings
# See sections 5.2 and 5.3 above

# Run ViaVoice dictation
viavoice