2017-11-18 59 views
0

我從網站複製代碼聽特定字詞使用Python pocketsphinx.It雖然運行,但從來沒有輸出關鍵字作爲expected.This是我的代碼:Pocketsphinx在python回報關鍵字搜索隨機單詞

import sys, os 
from pocketsphinx.pocketsphinx import * 
from sphinxbase.sphinxbase import * 
import pyaudio 

# modeldir = "../../../model" 
# datadir = "../../../test/data" 

modeldir="C://Users//hp//AppData//Local//Programs//Python//Python35//Lib//site-packages//pocketsphinx//model//en-us" 
dictdir="C://Users//hp//AppData//Local//Programs//Python//Python35//Lib//site-packages//pocketsphinx//model//cmudict-en-us.dict" 
lmdir="C://Users//hp//AppData//Local//Programs//Python//Python35//Lib//site-packages//pocketsphinx//model//en-us.lm.bin" 
# Create a decoder with certain model 
config = Decoder.default_config() 
config.set_string('-hmm', modeldir) 
config.set_string('-lm', lmdir) 
config.set_string('-dict', dictdir) 
config.set_string('-keyphrase', 'forward') 
config.set_float('-kws_threshold', 1e+20) 

p = pyaudio.PyAudio() 
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024) 
stream.start_stream() 

# Process audio chunk by chunk. On keyword detected perform action and restart search 
decoder = Decoder(config) 
decoder.start_utt() 
while True: 
    buf = stream.read(1024) 
    if buf: 
     decoder.process_raw(buf, False, False) 
    else: 
     break 
    if decoder.hyp() != None: 
     #print(decoder.hyp().hypstr) 
     if decoder.hyp().hypstr == 'forward': 
     print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()]) 
     print ("Detected keyword, restarting search") 
     decoder.end_utt() 
     decoder.start_utt() 

此外,當我使用print(decoder.hyp().hypstr)

它只是輸出隨機單詞時,我如果我說一個字或行其輸出講anything.For例如:

the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the da 
the head 
the bed 
the bedding 
the heading of 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and well 
the bedding and well 
the bedding and well 
the bedding and butler 
the bedding and what lingus 
the bedding and what lingus 
the bedding and what lingus 
the bedding and what lingus ha 
the bedding and blessed are 
the bedding and blessed are 
the bedding and what lingus on 
the bedding and what lingus want 
the bedding and what lingus want 
the bedding and what lingus want 
the bedding and what lingus want 
the bedding and what lingus want or 
the bedding and what lingus want to talk 
the bedding and what lingus current top 
the bedding and what lingus want to talk 
the bedding and what lingus want to talk 
the bedding and what lingus want to talk 
the bedding and what lingus want to talk 
the bedding and what lingus want to talk to her 
the bedding and what lingus want to talk to her 
the bedding and what lingus want to talk to her 
the bedding and what lingus want to talk to her 

請幫助我通過它。我只是一個Python新手。

回答

1

首先,我只是想澄清;你的Pocketsphinx 工作。

因此,根據我使用pocketsphinx的經驗,您幾乎可以使用most accurate語音識別工具,但可能是您離線解決方案的最佳選擇。 Pocketsphinx只能翻譯您的文字(音頻),最好像它的'model規定的那樣。這些模型似乎仍然是一項正在進行的工作,其中大部分需要改進。有幾件事你可以嘗試提高識別的準確性;如reducing noisetuning the recognition,但這不在此問題的直接範圍之內。

從我的理解你的代碼中,你正在尋找一個特定的關鍵字被說出來(用戶的聲音),並使用pocketshinx的後端識別它。這個關鍵詞似乎是「前進」的。你可以進一步閱讀如何正確完成"hot word listening"

你有正確的想法,但方法可以改進。這是我的「速戰速決」版本的代碼:

import os 
import pyaudio 
import pocketsphinx as ps 

modeldir = "C://Users//hp//AppData//Local//Programs//Python//Python35//Lib//site-packages//pocketsphinx//model//" 

# Create a decoder with certain model 
config = ps.Decoder.default_config() 
config.set_string('-hmm', os.path.join(modeldir, 'en-us')) 
config.set_string('-lm', os.path.join(modeldir, 'en-us.lm.bin')) 
config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict')) 
config.set_string('-keyphrase', 'forward') 
config.set_float('-kws_threshold', 1e+20) 

p = pyaudio.PyAudio() 
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024) 
stream.start_stream() 

# Process audio chunk by chunk. On keyword detected perform action and restart search 
decoder = ps.Decoder(config) 
decoder.start_utt() 

while True: 
    buf = stream.read(1024) 
    if buf: 
     decoder.process_raw(buf, False, False) 
    else: 
     break 
    if decoder.hyp() is not None: 
     print(decoder.hyp().hypstr) 
     if 'forward' in decoder.hyp().hypstr: 
      print([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()]) 
      print("Detected keyword, restarting search") 
      decoder.end_utt() 
      decoder.start_utt() 

對於任何一個pocketsphinx.Decoder()「會話」(即調用.start_utt()方法而隨後調用.ent_utt()),該decoder.hyp().hypstr變量將繼續有效將單詞添加到自身一旦它檢測到輸入音頻流具有來自pocketsphinx解碼的「有效」翻譯/識別。

您已使用if decoder.hyp().hypstr == 'forward':。它所做的是強制整個字符串完全「向前」,以便代碼進入(我認爲,期望...是?)條件代碼塊。由於pocketshinx默認情況下不是很準確,因此通常需要嘗試大部分單詞才能實際註冊正確的單詞。由於這個原因,並且自從decoder.hyp().hypstr增加到自身(如前所述),我已經使用了線if 'forward' in decoder.hyp().hypstr:。這會在整個字符串中查找所需的關鍵字「forward」。這樣,直到找到關鍵字才允許識別錯誤。

我希望它有幫助!

+0

謝謝你的回答。但是這段代碼對兄弟來說太沒有幫助了。它永遠不會在語言中識別單詞「前進」,而只是在我對它說話時打印隨機單詞。是否有什麼我在模型中缺少的東西? – TechieBoy101

+0

所有這一切意味着pocketsphinx的「翻譯」對於您輸入的數據而言並不十分準確。因此,正如我指出的那樣,在pocketsphinx正確識別您的單詞之前,您將不得不嘗試幾次(多次)。我明白這是多麼令人不滿。然後,您需要查看**增加識別的準確性,**正確**執行「熱門詞彙收聽」。這些鏈接在我原來的答案中提供。 –

0

您需要刪除此行

config.set_string('-lm', lmdir) 

關鍵詞的搜索和LM搜索是互斥的。

+0

非常感謝,真的很有用。我需要問你是否有辦法在pocketsphinx中聽到1個以上的關鍵詞或句子。這可能嗎? – TechieBoy101

+0

是的,您可以使用關鍵詞列表,請參閱http://cmusphinx.github.io/wiki/tutoriallm –