2011-09-01 43 views
0

人們已經想出瞭如何使用Google Speech API(Speech-To-Text)。我試圖讓它與Flash Speex編解碼器一起工作,但我無法弄清楚。我試過在每個160字節之前插入幀大小字節(正如一些消息來源所說),但這不起作用。Google Speech API的Flash SPEEX編解碼器 - 挑戰

所以我發佈了一個挑戰,以某種方式翻譯谷歌語音API的Flash speex字節來理解。

這是基本的Flex代碼:

<?xml version="1.0" encoding="utf-8"?> 
<s:Application xmlns:fx="http://ns.adobe.com/mxml/2009" 
      xmlns:s="library://ns.adobe.com/flex/spark" 
      creationComplete="init();"> 
<fx:Script> 
    <![CDATA[ 
     // Speech API info 
     // Reference: http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/, 
     // Reference: https://stackoverflow.com/questions/4361826/does-chrome-have-buil-in-speech-recognition-for-input-type-text-x-webkit-speec 
     private static const speechApiUrl:String = "http://www.google.com/speech-api/v1/recognize"; 
     private static const speechLanguage:String = "en"; 
     private static const mimeType:String = "audio/x-speex-with-header-byte"; 
     private static const sampleRate:uint = 8; 

     // Sound bytes & mic 
     private var soundBytes:ByteArray; 
     private var microphone:Microphone; 

     // Initial setup   
     private function init():void { 
      // Set up the microphone 
      microphone = Microphone.getMicrophone(); 
      // Speech API supports 8khz and 16khz rates 
      microphone.rate = sampleRate; 
      // Select the SPEEX codec 
      microphone.codec = SoundCodec.SPEEX; 
      // I don't know what effect this has... 
      microphone.framesPerPacket = 1; 
     } 

     // THIS IS THE CHALLENGE 
     // We have the flash speex bytes and we need to translate them so Google API understands 
     private function process():void{ 
      soundBytes.position = 0; 

      var processed:ByteArray = new ByteArray(); 
      processed.endian = Endian.BIG_ENDIAN; 
      var frameSize:uint = 160; 

      for(var n:uint = 0; n < soundBytes.bytesAvailable/frameSize; n++){ 
       processed.writeByte(frameSize); 

       processed.writeBytes(soundBytes, frameSize * n, frameSize); 
      } 

      processed.position = 0; 

      soundBytes = processed; 
     } 

     // Sending to Google Speech server 
     private function send():void { 
      var loader:URLLoader = new URLLoader(); 

      var request:URLRequest = new URLRequest(speechApiUrl + "?lang=" + speechLanguage); 
      request.method = URLRequestMethod.POST; 
      request.data = soundBytes; 
      request.contentType = mimeType + "; rate=" + (1000 * sampleRate); 

      loader.addEventListener(Event.COMPLETE, onComplete); 
      loader.addEventListener(IOErrorEvent.IO_ERROR, onError); 
      loader.load(request); 

      trace("Connecting to Speech API server"); 
     } 

     private function onError(event:IOErrorEvent):void{ 
      trace("Error: " + event.toString()); 
     } 

     private function onComplete(event:Event):void{ 
      trace("Done: " + event.target.data); 
     } 

     private function record(event:Event):void{ 
      soundBytes = new ByteArray(); 
      soundBytes.endian = Endian.BIG_ENDIAN; 

      microphone.addEventListener(SampleDataEvent.SAMPLE_DATA, sampleData); 
     } 

     private function sampleData(event:SampleDataEvent):void {    
      soundBytes.writeBytes(event.data, 0, event.data.bytesAvailable); 
     } 

     private function stop(e:Event):void { 
      microphone.removeEventListener(SampleDataEvent.SAMPLE_DATA, sampleData); 

      if(soundBytes != null){ 
       process(); 
       send(); 
      } 
     }  
    ]]> 
</fx:Script> 

<s:HGroup> 
    <s:Button label="Record" 
       click="record(event)"/> 
    <s:Button label="Stop and Send" 
       click="stop(event)"/> 
</s:HGroup> 
</s:Application> 

欲瞭解更多信息檢查此鏈接:http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/Does Chrome have built-in speech recognition for "x-webkit-speech" input elements?

回答

0

你正在尋找的代碼是在http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/speech_recognizer.cc?view=diff&r1=79556&r2=79557左右反過來#包括線100-160其中.../viewvc/chrome/trunk/deps/third_party/speex/

但是,Chrome在3月底從Speex切換到FLAC,在更改日誌中沒有任何實際解釋 - http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/speech_recognizer.cc?view=diff&r1=79556&r2=79557 - 所以我不會建議使用Speex。另一方面,有人看着Android的源代碼,並說他們仍然在那裏使用Speex,所以它很可能會保留它(它不到每秒鐘多少字節的一半)。