安装

指定版本安装(brew默认库是0.8版本)

$ export HOMEBREW_NO_ENV_FILTERING=1 
$ brew bump-formula-pr cmu-sphinxbase --URL=https://jaist.dl.sourceforge.net/project/cmusphinx/sphinxbase/5prealpha/sphinxbase-5prealpha.tar.gz --audit --strict

源码安装(推荐)

$ brew install --build-from-source --HEAD cmu-sphinxbase
$ brew install --build-from-source --HEAD cmu-pocketsphinx

数据格式处理

it needs to be a single-channel (monaural), little-endian, unheadered 16-bit signed PCM audio file sampled at 16000 Hz

单文件

$ ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav
$ ffmpeg -i a001_huhao.mp3 -acodec pcm_s16le -ac 1 -ar 16000 a001_huhao.wav

批处理

$ for f in *.mp3; do ffmpeg -i "$f" -acodec pcm_s16le -ac 1 -ar 16000 "${f%.mp3}.wav"; done
$ for f in *.wav; do mv "${f%.wav}.wav" "${f%.wav}"; done

删除小于100K的wav文件

$ for f in *.wav; do if [ `ls -l "$f" | awk '{print $5}'` -lt $((100*1024)) ]; then /bin/rm "$f"; fi; done

文件拼凑,将wav文件名跟字幕文件按行合并

$ awk '{getline x< "ticket_mp3"; split(x, a, "/"); printf("%s (%s)\n", $0, substr(a[5],1, index(a[5], ".")-1));}' ticket_trans
$ awk '{printf $0;getline < "ticket_mp3" ; split($0, a, "/"); printf(" (%s)\n", substr(a[5],1, index(a[5], ".")-1));}' ticket_trans
$ paste -d '' ticket_transcription ticket_wav

测试sphinxbase以及pocketsphinx库

c语言运行

$ gcc -o hello_ps hello_ps.c -DMODELDIR=`pkg-config --variable=modeldir pocketsphinx` `pkg-config --cflags --libs pocketsphinx sphinxbase`

自带工具运行

$ pocketsphinx_continuous -inmic yes
$ pocketsphinx_continuous -infile test.wav
$ pocketsphinx_continuous -lm hotel_book.lm.bin -infile test.wav 
$ pocketsphinx_continuous -infile <your_file.wav> -keyphrase <your keyphrase> -kws_threshold <your_threshold> -time yes
$ pocketsphinx_continuous -infile goforward.raw -kws_threshold 1e-20 -time yes

扩展 dict

dict 模型简介

dict是一个发音字典,标明每个单词的发音规则,e.g.
hello H EH L OW
world W ER L D

安装 g2p-seq2seq & tensorflow & tensor2tensor

$ pip install tensorflow
$ pip install tensor2tensor
$ git clone https://github.com/cmusphinx/g2p-seq2seq & cd g2p-seq2seq 
$ git checkout -b 6.2.0 605970f2639938b9594ea2f8ab79916a0cf4d6aa        #该版本的interactive模式是可用的
$ python setup.py install

使用g2p-seq2seq

$ g2p-seq2seq --model_dir g2p-seq2seq-cmudict --interactive 

扩展language model

安装 SRILM - The SRI Language Modeling Toolkit

$ wget http://www.speech.sri.com/projects/srilm/srilm_download.php 
$ tar -zxvf srilm-1.7.2.tar.gz 
$ export SRILM=/Users/wukong/Workspace/sphinx/srilm
$ gnumake World
$ export PATH="/Users/wukong/Workspace/sphinx/srilm/macos:/Users/wukong/Workspace/sphinx/srilm:$PATH"

训练language model

$ ngram-count -kndiscount -interpolate -text train_lm.txt -lm hotel_book.lm             #train_lm.txt是需要增强训练的文本内容,每行一个单词或者一句话
$ ngram-count -text train_lm.txt -lm hotel_book.lm                                      #如果数据量太小,-kndiscount -interpolate 不要使用
$ ngram -lm hotel_book.lm -prune 1e-8 -write-lm hotel_book-pruned.lm
$ ngram -lm hotel_book-pruned.lm -ppl test_lm.txt
$ ngram-count -text train_ticket_lm.txt -lm train_ticket.lm
$ ngram -lm train_ticket.lm -prune 1e-8 -write-lm train_ticket-pruned.lm
$ sphinx_lm_convert -i train_ticket-pruned.lm -o train_ticket.lm.bin

arpa格式<-->bin格式 转换

$ sphinx_lm_convert -i hotel_book-pruned.lm -o hotel_book.lm.bin                        #arpa-->bin
$ sphinx_lm_convert -i hotel_book.lm.bin -ifmt bin -o hotel_book.lm -ofmt arpa          #bin -->arpa
$ sphinx_lm_convert -i en-us.lm.bin -ifmt bin -o en-us.lm -ofmt arpa

测试新的language model

$ pocketsphinx_continuous -inmic yes -lm hotel_book-pruned.lm -dict en-us.dic 

扩展 acoustic model

it’s enough to have 5 minutes of speech to significantly improve the dictation accuracy by adapting to the particular speaker

安装sphinxtrain

$ git clone https://github.com/cmusphinx/sphinxbase.git 
$ ./autogen.sh && make          #会生成src/libsphinxbase/libsphinxbase.la 文件,下一步会用到
$ git clone https://github.com/cmusphinx/sphinxtrain 
$ ./autogen.sh && ./configure && make && make install
$ export PATH="/usr/local/libexec/sphinxtrain:$PATH"            #包含后续步骤的bw、mllr_solve、map_adapt、mk_s2sendump等工具

复制相关模型文件

$ cp /Users/wukong/.pyenv/versions/3.6.3/envs/speech/lib/python3.6/site-packages/pocketsphinx/model/cmudict-en-us.dict .
$ cp /Users/wukong/.pyenv/versions/3.6.3/envs/speech/lib/python3.6/site-packages/pocketsphinx/model/en-us.lm.bin .
$ cp /Users/wukong/.pyenv/versions/3.6.3/envs/speech/lib/python3.6/site-packages/pocketsphinx/model/en-us en-us-default     #后续步骤会下载full版本的hmm model

抽取 acoustic model

$ wget https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-ptm-5.2.tar.gz/download 
$ tar -zxvf cmusphinx-en-us-ptm-5.2.tar.gz && mv cmusphinx-en-us-ptm-5.2 en-us          #使用full model,确保mdef是文本格式,且包含mixture_weights文件
$ sphinx_fe -argfile en-us/feat.params -samprate 16000 -c arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes    #为每一个wav文件生成acoustic feature模型(mfc文件)

mdef转化成文本格式

$ pocketsphinx_mdef_convert -text en-us/mdef en-us/mdef.txt                             #如果mdef是binary,则转化成文本格式mdef.txt
$ cp en-us/mdef en-us/mdef.txt                                                          #如果mdef已经是文本格式,则直接复制一份

生成适配模型

$ bw -hmmdir en-us -moddeffn en-us/mdef.txt -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn cmudict-en-us.dict -ctlfn arctic20.fileids -lsnfn arctic20.transcription -accumdir .                             #en-us/mdef需要转化为文本格式
$ mllr_solve -meanfn en-us/means -varfn en-us/variances -outmllrfn mllr_matrix -accumdir .
$ cp -a en-us en-us-adapt
$ map_adapt -moddeffn en-us/mdef.txt -ts2cbfn .ptm. -meanfn en-us/means -varfn en-us/variances -mixwfn en-us/mixture_weights -tmatfn en-us/transition_matrices -accumdir . -mapmeanfn en-us-adapt/means -mapvarfn en-us-adapt/variances -mapmixwfn en-us-adapt/mixture_weights -maptmatfn en-us-adapt/transition_matrices
$ mk_s2sendump -pocketsphinx yes -moddeffn en-us-adapt/mdef.txt -mixwfn en-us-adapt/mixture_weights -sendumpfn en-us-adapt/sendump

测试adapt模型

$ pocketsphinx_continuous -hmm en-us-adapt -lm en-us.lm.bin -dict cmudict-en-us.dict -infile arctic_a0001.wav           #使用新模型
$ pocketsphinx_continuous -hmm en-us-default -lm en-us.lm.bin -dict cmudict-en-us.dict -infile arctic_a0001.wav         #使用默认模型

统计准确率

word_align.pl: 位于sphinxtrain的一个perl脚本

$ cp ../sphinxtrain/scripts/decode/word_align.pl .
$ chmod u+x word_align.pl

$ pocketsphinx_batch -adcin yes -cepdir wav -cepext .wav -ctl test.fileids -lm en-us.lm.bin -dict cmudict-en-us.dict -hmm en-us -hyp test.hyp
$ ./word_align.pl test.transcription test.hyp

问题

gem rubocop 安装失败

错误提示:
==> Installing or updating 'rubocop' gem
ERROR: Error installing rubocop:
invalid gem: package is corrupt, exception while verifying: undefined method `size' for nil:NilClass (NoMethodError) in /usr/local/Homebrew/Library/Homebrew/vendor/bundle/ruby/2.3.0/cache/parser-2.5.1.2.gem
Error: Failed to install/update the 'rubocop' gem.

解决方案:使用系统的ruby或者rbenv指定的ruby,而不要使用 homebrew自带的 ruby。

$ rm -rf /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby
$ rm -rf /usr/local/Homebrew/Library/Homebrew/vendor/bundle/ruby

sphinxbase或者pocketsphinx安装出现网络错误,git下载失败

解决方案:手动下载git库,并放到brew cache目录,然后再运行brew安装

$ git clone --depth 1 --branch master https://github.com/cmusphinx/pocketsphinx.git /Users/wukong/Library/Caches/Homebrew/cmu-pocketsphinx--git
$ brew install --build-from-source --HEAD cmu-pocketsphinx

参考