FreeSpeech is a free and open-source (FOSS) cross-platform dictation, voice transcription, and realtime speech recognition application for the desktop. Offline speaker-independent voice recognition with dynamic language learning capability using the PocketSphinx speech recognition engine.
Windows: We are waiting for pocketsphinx to be ported to gstreamer 1.0 before we can try again to make FreeSpeech work reliably on Windows. You can try it, but do not expect immediate success...
Linux/Cygwin The following packages should be installed through the package manager.
- Python 2.7
- gstreamer, including gstreamer-python
- pocketsphinx and sphinxbase
- CMU-Cambridge Statistical Language Modeling Toolkit v2 (documentation)
On Fedora, for example, install these dependencies.
su -c 'yum -y install gstreamer-python sphinxbase-libs \
pocketsphinx-libs pocketsphinx sphinxbase pocketsphinx-plugin \
python-simplejson python-xlib pygtk2'
Ubuntu: https://launchpad.net/ubuntu/+source/pocketsphinx Also install python-xlib python-simplejson python-gtk2
Download CMU-Cam_Toolkit_v2 and unpack it. Read the instructions in the README and edit the Makefile. Manually copy the tools from the bin directory somewhere in $PATH like: /usr/local/bin
Unpack FreeSpeech into a user-writeable folder such as Downloads/freespeech
Language files and preferences are copied to $XDG_CONFIG_HOME, so be sure to set this environment variable to a user-writeable location if it isn't already set. (This normally points to /home/$USER/.config on most modern Linux systems.) Try it:
If that command doesn't print anything, you need to add XDG_CONFIG_HOME to your environment. On Linux, this is done by editing .bashrc. Windows users do it another way.
Note: Be sure not to put anything in .bashrc that would echo text to the screen. That tends to mess up programs that spawn shells.
There is no desktop icon yet. Right-click on the desktop to create one. Launching the program may be done via the Python interpreter.
Position the microphone somewhere near your face and begin talking. To end of the sentence, say "period" (or "colon", "question-mark", "exclamation-point") Look at the dictionary, "custom.dic" for ideas.
Voice commands are included. A list of commands pops up at start-up or you may say "show commands" to show them again. The following voice commands are supported (except only "scratch that" is implemented with "send keys" experimental X keyboard emulation):
- file quit - quits the program
- file open - open a text file in the editor
- file save (as) - save the file
- show commands - pops up a customize-able list of spoken commands
- editor clear - clears all text in the editor and starts over
- delete - delete [text] or erase selected text
- insert - move cursor after word or punctuation example: "Insert after period"
- select - select [text] example: "select the states"
- go to the end - put cursor at end of document
- scratch that - erase last spoken text
- back space - erase one character
- new paragraph - equivalent to pressing Enter twice
Make sure the various requirements work. For example, "pocketsphinx", which may be run from the command line.
If you get messages like this:
Trouble writing /home/*/.config/FreeSpeech/freespeech.idngram
It usually means you haven't installed the CMU-Cambridge Statistical Language Modeling Toolkit v2 or there is a problem with the tools themselves. You must edit the Makefile and follow the instructions therein before running make. Then you must manually copy the files in the bin directory somewhere in your $PATH like /usr/local/bin on Linux or C:\windows\system32 on Windows.
For some reason, the toolkit expects to be able to write to /usr/tmp. The tmpfile() function uses the P_tmpdir defined in <stdio.h>, so perhaps it was compiled on such a machine. Building the toolkit from source should fix that. The quick-fix is to provide /usr/tmp for machines that don't have it.
ln -s /tmp /usr/tmp
The biggest improvements in accuracy have been achieved by adjusting the microphone position. The volume level is adjusted automatically and tweaking it is not likely to improve much. You may try making a recording with Audacity and checking the noise levels to make sure it sounds like intelligible speech when played back.
Adapt PocketSphinx to your voice for better accuracy. See http://cmusphinx.sourceforge.net/wiki/tutorialadapt
The language corpus that ships with this download, "freespeech.ref.txt" is likely to be very limited. Our excuse is that the small size saves memory while providing room to learn your spoken grammar. Don't be surprised if it does not work very well for you at first. Use the keyboard to manually edit the text in the box until it says what you intended to say. Then hit the "Learn" button. It will try to do better at understanding you next time! You may also train it by pasting in gobs of text from websites and documents.
It seems that the PocketSphinx folks were trying to add support for capitalized words. If there is a word like "new" in the dictionary which could also be capitalized, as in "New Mexico" it is enough to make a capitalized copy like so:
new N UW
New N UW
Now PocketSphinx will decide the capitalization depending on the context in which it appears in the text corpus. We tested it and it works! It capitalizes words like "New Mexico" and "The United States of America" but does not capitalize "altered states" nor "new pants". This is a wild idea, but maybe we could make a dictionary containing both capitalized and un-capitalized words. That would save us the effort of going through and capitalizing all the proper names. The only question is would the resulting dictionary be too big? The solution is probably to propose a patch to make make PocketSphinx ignore case in the dictionary, using the word or acronym as it is found in the corpus, not the dictionary.
Don't worry if you inadvertently teach PocketSphinx bad grammar. It's not strictly necessary, but our corpus file, "lm/freespeech.ref.txt" may be manually corrected if it develops bad speech habits. Changes will apply next time you hit the "Learn" button.
The language model may be further tweaked and improved.
If there is a word that it stubbornly refuses to recognize, even after teaching it with several sentences, you may need to edit the dictionary: "freespeech.dic"
Sometimes the dictionary pronunciation can be little bit off. You will notice that some other words have alternate pronunciations denoted with (2). Go ahead and add an alternate and see if it doesn't improve immediately the next time the program starts.
This dictionary is based on Pocketsphinx's cmu07a.dic because it contains punctuation (Can you say ".full-stop" or "?question-mark") and some capitalized words. See "freespeech.dic" for the list of punctuation and their pronunciations. Adding new words to the dictionary may be done manually, along with their phonetic representation, but we are working on incorporating a word trainer.
About the CMU Pronouncing Dictionary http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Security and privacy
FreeSpeech does not send information over the network. Speech recognition is done locally using pocketsphinx. Learned speech patterns are stored in "plain text" format in "lm/freespeech.ref.txt". You probably do not want to teach FreeSpeech sensitive or private information like passwords, especially if other people will have access to your PC.