[ home ]

Sphinx2

Quick Search:
Advanced

Notes on Sphinx2

Creating Language Models

 How to build the language model/dictionary for sphinx2:
    http://www.speech.cs.cmu.edu/sphinx/doc/sphinx-FAQ.html
    Sphinx2 only needs a pronounciation dictionary (.dict), and a language model (.lm)
 Building a dictionary:
 http://www.speech.cs.cmu.edu/cgi-bin/cmudict
    ftp://ftp.cs.cmu.edu/afs/cs.cmu.edu/data/anonftp/project/fgdata/dict/
       ftp://ftp.cs.cmu.edu/afs/cs.cmu.edu/data/anonftp/project/fgdata/dict/cmudict.0.6.gz
       You would grep the dictionary for your words
       Strip out the stress numbers
 Building language models
 http://www.speech.cs.cmu.edu/SLM/toolkit.html
    http://www.speech.cs.cmu.edu/SLM/CMU-Cam_Toolkit_v2.tar.gz
 cat output/zork.corpus | ./text2wfreq > output/zork.wfreq
 cat output/zork.wfreq | ./wfreq2vocab -top 20000 > output/zork.vocab
 cat output/zork.corpus | ./text2idngram -vocab output/zork.vocab > output/zork.idngram
 ./idngram2lm -idngram output/zork.idngram -vocab output/zork.vocab -binary output/zork.binlm
 ./idngram2lm -idngram output/zork.idngram -vocab output/zork.vocab -arpa output/zork.lm
5 best outgoing links:

5 best incoming links:
ZoIP (22)
sprec (16)

5 most popular nearby:
** ZoIP (51337)
** sprec (5657)