2013-10-21 105 views
14

我已經得到了OpenCV和PyAudio的工作,但我不知道如何將它們同步到一起。我無法從OpenCV獲得幀率,並且每時每刻都在測量幀的呼叫時間。然而,使用PyAudio的基礎是抓取某個採樣率。我如何將它們同步到同一速度。我假設有一些標準或某種方式的編解碼器。 (我試過谷歌所有我得到的信息脣同步:/)。用OpenCV和PyAudio同步音頻和視頻

OpenCV的幀速率

from __future__ import division 
import time 
import math 
import cv2, cv 

vc = cv2.VideoCapture(0) 
# get the frame 
while True: 

    before_read = time.time() 
    rval, frame = vc.read() 
    after_read = time.time() 
    if frame is not None: 
     print len(frame) 
     print math.ceil((1.0/(after_read - before_read))) 
     cv2.imshow("preview", frame) 

     if cv2.waitKey(1) & 0xFF == ord('q'): 
      break 

    else: 
     print "None..." 
     cv2.waitKey(1) 

# display the frame 

while True: 
    cv2.imshow("preview", frame) 

    if cv2.waitKey(1) & 0xFF == ord('q'): 
     break 

拼搶和保存音頻

from sys import byteorder 
from array import array 
from struct import pack 

import pyaudio 
import wave 

THRESHOLD = 500 
CHUNK_SIZE = 1024 
FORMAT = pyaudio.paInt16 
RATE = 44100 

def is_silent(snd_data): 
    "Returns 'True' if below the 'silent' threshold" 
    print "\n\n\n\n\n\n\n\n" 
    print max(snd_data) 
    print "\n\n\n\n\n\n\n\n" 
    return max(snd_data) < THRESHOLD 

def normalize(snd_data): 
    "Average the volume out" 
    MAXIMUM = 16384 
    times = float(MAXIMUM)/max(abs(i) for i in snd_data) 

    r = array('h') 
    for i in snd_data: 
     r.append(int(i*times)) 
    return r 

def trim(snd_data): 
    "Trim the blank spots at the start and end" 
    def _trim(snd_data): 
     snd_started = False 
     r = array('h') 

     for i in snd_data: 
      if not snd_started and abs(i)>THRESHOLD: 
       snd_started = True 
       r.append(i) 

      elif snd_started: 
       r.append(i) 
     return r 

    # Trim to the left 
    snd_data = _trim(snd_data) 

    # Trim to the right 
    snd_data.reverse() 
    snd_data = _trim(snd_data) 
    snd_data.reverse() 
    return snd_data 

def add_silence(snd_data, seconds): 
    "Add silence to the start and end of 'snd_data' of length 'seconds' (float)" 
    r = array('h', [0 for i in xrange(int(seconds*RATE))]) 
    r.extend(snd_data) 
    r.extend([0 for i in xrange(int(seconds*RATE))]) 
    return r 

def record(): 
    """ 
    Record a word or words from the microphone and 
    return the data as an array of signed shorts. 

    Normalizes the audio, trims silence from the 
    start and end, and pads with 0.5 seconds of 
    blank sound to make sure VLC et al can play 
    it without getting chopped off. 
    """ 
    p = pyaudio.PyAudio() 
    stream = p.open(format=FORMAT, channels=1, rate=RATE, 
     input=True, output=True, 
     frames_per_buffer=CHUNK_SIZE) 

    num_silent = 0 
    snd_started = False 

    r = array('h') 

    while 1: 
     # little endian, signed short 
     snd_data = array('h', stream.read(1024)) 
     if byteorder == 'big': 
      snd_data.byteswap() 

     print "\n\n\n\n\n\n" 
     print len(snd_data) 
     print snd_data 

     r.extend(snd_data) 

     silent = is_silent(snd_data) 

     if silent and snd_started: 
      num_silent += 1 
     elif not silent and not snd_started: 
      snd_started = True 

     if snd_started and num_silent > 1: 
      break 

    sample_width = p.get_sample_size(FORMAT) 
    stream.stop_stream() 
    stream.close() 
    p.terminate() 

    r = normalize(r) 
    r = trim(r) 
    r = add_silence(r, 0.5) 
    return sample_width, r 

def record_to_file(path): 
    "Records from the microphone and outputs the resulting data to 'path'" 
    sample_width, data = record() 
    data = pack('<' + ('h'*len(data)), *data) 

    wf = wave.open(path, 'wb') 
    wf.setnchannels(1) 
    wf.setsampwidth(sample_width) 
    wf.setframerate(RATE) 
    wf.writeframes(data) 
    wf.close() 

if __name__ == '__main__': 
    print("please speak a word into the microphone") 
    record_to_file('demo.wav') 
    print("done - result written to demo.wav") 
+1

如果你安裝了一個正在工作的'pyffmpeg',你可以嘗試使用'ffmpeg'的視頻(和音頻)顯示功能,而不是使用OpenCV進行視頻顯示。 – boardrider

回答

1

我想你會更好使用或者GSreamer或ffmpeg的,或者如果你是Windows,DirectShow的。這些庫可以處理音頻和視頻,並且應該具有某種多路複用器,以便您可以正確混合視頻和音頻。

但是如果你真的想用Opencv來做這個,你應該可以使用VideoCapture來獲得幀率,你試過用this嗎?

fps = cv.GetCaptureProperty(vc, CV_CAP_PROP_FPS) 

另一種方法是估算FPS按時間劃分幀數:

nFrames = cv.GetCaptureProperty(vc, CV_CAP_PROP_FRAME_COUNT) 
      cv.SetCaptureProperty(vc, CV_CAP_PROP_POS_AVI_RATIO, 1) 
duration = cv.GetCaptureProperty(vc, CV_CAP_PROP_POS_MSEC) 
fps = 1000 * nFrames/duration; 

我不知道我明白你試圖在這裏做的:

before_read = time.time() 
rval, frame = vc.read() 
after_read = time.time() 

在我看來,做after_read - before_read只測量了OpenCV加載下一幀需要多長時間,它不測量fps。 OpenCV並沒有試圖進行回放,它只是加載幀,它會盡可能以最快速度進行回放,我認爲無法對其進行配置。我認爲在顯示每一幀後放置一個waitKey(1/fps)就可以實現你要找的東西。

+0

@ Zimm3r它工作嗎? –

+0

儘管這是非常晚,但我沒有使用GStreamer,因爲我有一些特定的目標,並且在過去遇到過GStreamer問題。 – Zimm3r