Notes on Sphinx2
Creating Language Models
How to build the language model/dictionary for sphinx2:
http://www.speech.cs.cmu.edu/sphinx/doc/sphinx-FAQ.html
Sphinx2 only needs a pronounciation dictionary (.dict), and a language model (.lm)
Building a dictionary:
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
ftp://ftp.cs.cmu.edu/afs/cs.cmu.edu/data/anonftp/project/fgdata/dict/
ftp://ftp.cs.cmu.edu/afs/cs.cmu.edu/data/anonftp/project/fgdata/dict/cmudict.0.6.gz
You would grep the dictionary for your words
Strip out the stress numbers
Building language models
http://www.speech.cs.cmu.edu/SLM/toolkit.html
http://www.speech.cs.cmu.edu/SLM/CMU-Cam_Toolkit_v2.tar.gz
cat output/zork.corpus | ./text2wfreq > output/zork.wfreq
cat output/zork.wfreq | ./wfreq2vocab -top 20000 > output/zork.vocab
cat output/zork.corpus | ./text2idngram -vocab output/zork.vocab > output/zork.idngram
./idngram2lm -idngram output/zork.idngram -vocab output/zork.vocab -binary output/zork.binlm
./idngram2lm -idngram output/zork.idngram -vocab output/zork.vocab -arpa output/zork.lm
last edited September 23, 2006
( info )
( diff )