2010-11-21 39 views
2

問候,DirectShow的音頻/視頻PTS時鐘控制計算

我已經寫一個DirectShow源過濾器,從ATSC-153廣播,寫上的WinCE/ARM視頻處理器取AVC/AAC視頻幀/ AAC接入單元。輸出引腳(其中2個,視頻一個,音頻一個)連接到適當的解碼器和渲染器。目前,我正在從適當的RTP標題中獲取PTS,並將它們傳遞給源過濾器並對directshow時鐘執行計算。視頻PTS的速率爲90KHz,音頻PTS速率不同,我目前的測試流的音頻滴答聲爲55.2Khz。

接下來是convert_to_dshow_timestamp()和FillBuffer()例程。在過濾器檢索視頻/音頻時,我打印出轉換後的時間戳,時間差在100-200毫秒之內。這不會是壞事,可以與之合作。但是,視頻會在2-3秒後播放音頻。

/*常規轉換的時鐘速率來器directshow時鐘速率*/ 靜態無符號長長convert_to_dshow_timestamp( 無符號長長TS, 無符號長率 ) { 長雙赫茲; 長雙倍; long double tmp;

if (rate == 0) 
{ 
    return 0; 
} 

hz = (long double) 1.0/rate; 
multi = hz/1e-7; 

tmp = ((long double) ts * multi) + 0.5; 
return (unsigned long long) tmp; 

}

/*源濾波器FillBuffer()例程*/ HRESULT OutputPin :: FillBuffer(IMediaSample * pSamp) { BYTE * pData所; DWORD dataSize; pipeStream stream; BOOL retVal; DWORD returnBytes; HRESULT hr; DWORD discont; REFERENCE_TIME ts; REFERENCE_TIME df; unsigned long long difPts; unsigned long long difTimeRef;

pSamp->GetPointer(&pData); 
dataSize = pSamp->GetSize(); 

ZeroMemory(pData, dataSize); 

stream.lBuf = pData; 
stream.dataSize = dataSize; 

/* Pin type 1 is H.264 AVC video frames */ 
if (m_iPinType == 1) 
{ 
    retVal = DeviceIoControl(
           ghMHTune, 
           IOCTL_MHTUNE_RVIDEO_STREAM, 
           NULL, 
           0, 
           &stream, 
           sizeof(pipeStream), 
           &returnBytes, 
           NULL 
          ); 
    if (retVal == TRUE) 
    { 
     /* Get the data */ 
     /* Check for the first of the stream, if so, set the start time */ 
     pSamp->SetActualDataLength(returnBytes); 
     hr = S_OK; 
     if (returnBytes > 0) 
     { 
      /* The discontinuety is set in upper layers, when an RTP 
      * sequence number has been lost. 
      */ 
      discont = stream.discont; 

      /* Check for another break in stream time */ 
      if (
       m_PrevTimeRef && 
       ((m_PrevTimeRef > (stream.timeRef + 90000 * 10)) || 
       ((m_PrevTimeRef + 90000 * 10) < stream.timeRef)) 
       ) 
      { 
       dbg_log(TEXT("MY:DISC HERE\n")); 
       if (m_StartStream > 0) 
       { 
        discont = 1; 
       } 
      } 

      /* If the stream has not started yet, or there is a 
      * discontinuety then reset the stream time. 
      */ 
      if ((m_StartStream == 0) || (discont != 0)) 
      { 
       sys_time = timeGetTime() - m_ClockStartTime; 
       m_OtherSide->sys_time = sys_time; 

       /* For Video, the clockRate is 90Khz */ 
       m_RefGap = (sys_time * (stream.clockRate/1000)) + 
                (stream.clockRate/2); 

       /* timeRef is the PTS for the frame from the RTP header */ 
       m_TimeGap = stream.timeRef; 
       m_StartStream = 1; 
       difTimeRef = 1; 
       m_PrevPTS = 0; 
       m_PrevSysTime = timeGetTime(); 
       dbg_log(
         TEXT("MY:StartStream %lld: %lld: %lld\n"), 
         sys_time, 
         m_RefGap, 
         m_TimeGap 
         ); 
      } 
      else 
      { 
       m_StartStream++; 
      } 

      difTimeRef = stream.timeRef - m_PrevTimeRef; 
      m_PrevTimeRef = stream.timeRef; 

      /* Difference in 90 Khz clocking */ 
      ts = stream.timeRef - m_TimeGap + m_RefGap; 
      ts = convert_to_dshow_timestamp(ts, stream.clockRate); 

      if (discont != 0) 
      { 
       dbg_log(TEXT("MY:VDISC TRUE\n")); 
       pSamp->SetDiscontinuity(TRUE); 
      } 
      else 
      { 
       pSamp->SetDiscontinuity(FALSE); 
       pSamp->SetSyncPoint(TRUE); 
      } 

      difPts = ts - m_PrevPTS; 

      df = ts + 1; 
      m_PrevPTS = ts; 
      dbg_log(
        TEXT("MY:T %lld: %lld = %lld: %d: %lld\n"), 
        ts, 
        m_OtherSide->m_PrevPTS, 
        stream.timeRef, 
        (timeGetTime() - m_PrevSysTime), 
        difPts 
        ); 

      pSamp->SetTime(&ts, &df); 
      m_PrevSysTime = timeGetTime(); 
     } 
     else 
     { 
      Sleep(10); 
     } 
    } 
    else 
    { 
     dbg_log(TEXT("MY: Fill FAIL\n")); 
     hr = E_FAIL; 
    } 
} 
else if (m_iPinType == 2) 
{ 
    /* Pin Type 2 is audio AAC Access units, with ADTS headers */ 
    retVal = DeviceIoControl(
           ghMHTune, 
           IOCTL_MHTUNE_RAUDIO_STREAM, 
           NULL, 
           0, 
           &stream, 
           sizeof(pipeStream), 
           &returnBytes, 
           NULL 
          ); 

    if (retVal == TRUE) 
    { 
     /* Get the data */ 
     /* Check for the first of the stream, if so, set the start time */ 
     hr = S_OK; 
     if (returnBytes > 0) 
     { 
      discont = stream.discont; 
      if ((m_StartStream == 0) || (discont != 0)) 
      { 
       sys_time = timeGetTime() - m_ClockStartTime; 
       m_RefGap = (sys_time * (stream.clockRate/1000)) + 
                (stream.clockRate/2); 

       /* Mark the first PTS from stream. This PTS is from the 
       * RTP header, and is usually clocked differently than the 
       * video clock. 
       */ 
       m_TimeGap = stream.timeRef; 
       m_StartStream = 1; 
       difTimeRef = 1; 
       m_PrevPTS = 0; 
       m_PrevSysTime = timeGetTime(); 
       dbg_log(
         TEXT("MY:AStartStream %lld: %lld: %lld\n"), 
         sys_time, 
         m_RefGap, 
         m_TimeGap 
         ); 
      } 

      /* Let the video side stream in first before letting audio 
      * start to flow. 
      */ 
      if (m_OtherSide->m_StartStream < 32) 
      { 
       pSamp->SetActualDataLength(0); 
       Sleep(10); 
       return hr; 
      } 
      else 
      { 
       pSamp->SetActualDataLength(returnBytes); 
      } 

      difTimeRef = stream.timeRef - m_PrevTimeRef; 
      m_PrevTimeRef = stream.timeRef; 

      if (discont != 0) 
      { 
       dbg_log(TEXT("MY:ADISC TRUE\n")); 
       pSamp->SetDiscontinuity(TRUE); 
      } 
      else 
      { 
       pSamp->SetDiscontinuity(FALSE); 
       pSamp->SetSyncPoint(TRUE); 
      } 

      /* Difference in Audio PTS clock, TESTING AT 55.2 Khz */ 
      ts = stream.timeRef - m_TimeGap + m_RefGap; 
      ts = convert_to_dshow_timestamp(ts, stream.clockRate); 

      difPts = ts - m_PrevPTS; 

      df = ts + 1; 
      m_PrevPTS = ts; 
      dbg_log(
        TEXT("MY:AT %lld = %lld: %d: %lld\n"), 
        ts, 
        stream.timeRef, 
        (timeGetTime() - m_PrevSysTime), 
        difPts 
        ); 

      pSamp->SetTime(&ts, &df); 
      m_PrevSysTime = timeGetTime(); 
     } 
     else 
     { 
      pSamp->SetActualDataLength(0); 
      Sleep(10); 
     } 
    } 
} 
return hr; 

}代碼 /* * /結束

我試圖調整視頻PTS,通過簡單地增加(90000×10),以查看該視頻將大大先於音頻,但它沒有。視頻仍會在2秒或更長時間內播放音頻。我真的不明白爲什麼這不起作用。每個視頻幀應該提前10秒。這不正確嗎?

他們的主要問題是,基本上,算法的聲音?他們似乎可以獨立運行視頻/音頻。

源過濾器不是推式過濾器,我不確定這是否會有所作爲。我沒有解碼器與廣播輸入不同步的問題。

非常感謝。

+0

拖尾是否會隨時間發生,還是立即以2-3秒的延遲開始?此外,請將代碼重新格式化爲更具可讀性。 – BeemerGuy 2010-11-21 01:52:37

+0

抱歉,關於代碼格式。我會盡力在下一次做得更好。 – davroslyrad 2010-12-01 14:47:29

回答

3

其實我找出了問題,其中有兩個。

第一個問題是對SPS H.264幀的糟糕解決。當解碼器啓動時,它會丟棄每一幀直到找到SPS幀。數據流以每秒15幀的速度編碼。這將推遲計時,因爲解碼器將在不到10ms的時間內消耗高達1秒的視頻。之後呈現的每一幀都被認爲是晚了,它會嘗試並快速轉發幀以趕上。作爲一個活的來源,它會再次用完框架。解決方法放在我的代碼中,以確保存在至少32幀的緩衝區,大約2秒。

第二個問題真正圍繞着問題的根源。我使用RTP頭中的PTS作爲時間參考。雖然這可以在單個音頻和/或視頻情況下工作,但不能保證視頻RTP PTS將匹配相應的音頻RTP PTS,並且通常不會。因此,根據下式使用的RTCP NTP時間,按照該規範:

PTS = RTCP_SR_NTP_timestamp + (RTP_timestamp - RTCP_SR_RTP_timestamp)/media_clock_rate 

這使我實際的視頻PTS匹配到相應的音頻PTS。