2013-06-03 153 views
6

我正在捕獲音頻並將音頻流式傳輸到RTMP服務器。我在MacOS下工作(在Xcode中),所以爲了捕獲音頻採樣緩衝區,我使用了AVFoundation-framework。但對於編碼和流媒體,我需要使用ffmpeg-API和libfaac編碼器。所以輸出格式必須是AAC(用於在iOS設備上支持流播放)。當輸入的pcm採樣數不相等時,如何使用ffmpeg-API將重採樣的PCM音頻編碼爲AAC 1024

我遇到了這樣的問題:音頻捕獲設備(在我的情況下,羅技相機)給了我512採樣LPCM採樣緩衝區,我可以選擇輸入採樣率從16000,24000,36000或48000赫茲。當我將這512個樣本提供給AAC編碼器(配置爲適當的採樣率)時,我聽到一個慢速和抖動音頻(看起來像每幀之後的沉默片)。

我想通了(可能我錯了),該libfaac編碼器只接受1024採樣音頻幀。當我將輸入採樣率設置爲24000並在編碼之前將輸入採樣緩衝器重新採樣爲48000時,我獲得了1024個重採樣採樣。在將這些1024採樣率編碼到AAC後,我聽到輸出聲音正確。但是,當輸出採樣率必須爲48000Hz時,我的網絡攝像頭可以爲任何輸入採樣率在緩衝區中生成512個樣本。所以我需要在任何情況下進行重採樣,並且在重採樣之後,我不會在緩衝區中準確獲得1024個採樣。

有沒有辦法在ffmpeg-API功能中解決這個問題

我將不勝感激任何幫助。

PS: 我想我可以累積重新採樣的緩衝區,直到採樣數變爲1024,然後對其進行編碼,但這是流,所以會產生時間戳和其他輸入設備的麻煩,而且這種解決方案不是適當。

目前的問題就出來了在[提問]描述的問題:How to fill audio AVFrame (ffmpeg) with the data obtained from CMSampleBufferRef (AVFoundation)?

這裏是音頻編解碼器CONFIGS代碼(有也是視頻流,但視頻做工精細):

/*global variables*/ 
    static AVFrame *aframe; 
    static AVFrame *frame; 
    AVOutputFormat *fmt; 
    AVFormatContext *oc; 
    AVStream *audio_st, *video_st; 
Init() 
{ 
    AVCodec *audio_codec, *video_codec; 
    int ret; 

    avcodec_register_all(); 
    av_register_all(); 
    avformat_network_init(); 
    avformat_alloc_output_context2(&oc, NULL, "flv", filename); 
    fmt = oc->oformat; 
    oc->oformat->video_codec = AV_CODEC_ID_H264; 
    oc->oformat->audio_codec = AV_CODEC_ID_AAC; 
    video_st = NULL; 
    audio_st = NULL; 
    if (fmt->video_codec != AV_CODEC_ID_NONE) 
     { //… /*init video codec*/} 
    if (fmt->audio_codec != AV_CODEC_ID_NONE) { 
    audio_codec= avcodec_find_encoder(fmt->audio_codec); 

    if (!(audio_codec)) { 
     fprintf(stderr, "Could not find encoder for '%s'\n", 
       avcodec_get_name(fmt->audio_codec)); 
     exit(1); 
    } 
    audio_st= avformat_new_stream(oc, audio_codec); 
    if (!audio_st) { 
     fprintf(stderr, "Could not allocate stream\n"); 
     exit(1); 
    } 
    audio_st->id = oc->nb_streams-1; 

    //AAC: 
    audio_st->codec->sample_fmt = AV_SAMPLE_FMT_S16; 
    audio_st->codec->bit_rate = 32000; 
    audio_st->codec->sample_rate = 48000; 
    audio_st->codec->profile=FF_PROFILE_AAC_LOW; 
    audio_st->time_base = (AVRational){1, audio_st->codec->sample_rate }; 
    audio_st->codec->channels = 1; 
    audio_st->codec->channel_layout = AV_CH_LAYOUT_MONO;  


    if (oc->oformat->flags & AVFMT_GLOBALHEADER) 
     audio_st->codec->flags |= CODEC_FLAG_GLOBAL_HEADER; 
    } 

    if (video_st) 
    { 
    // … 
    /*prepare video*/ 
    } 
    if (audio_st) 
    { 
    aframe = avcodec_alloc_frame(); 
    if (!aframe) { 
     fprintf(stderr, "Could not allocate audio frame\n"); 
     exit(1); 
    } 
    AVCodecContext *c; 
    int ret; 

    c = audio_st->codec; 


    ret = avcodec_open2(c, audio_codec, 0); 
    if (ret < 0) { 
     fprintf(stderr, "Could not open audio codec: %s\n", av_err2str(ret)); 
     exit(1); 
    } 

    //… 
} 

和重採樣和編碼音頻:

if (mType == kCMMediaType_Audio) 
{ 
    CMSampleTimingInfo timing_info; 
    CMSampleBufferGetSampleTimingInfo(sampleBuffer, 0, &timing_info); 
    double pts=0; 
    double dts=0; 
    AVCodecContext *c; 
    AVPacket pkt = { 0 }; // data and size must be 0; 
    int got_packet, ret; 
    av_init_packet(&pkt); 
    c = audio_st->codec; 
     CMItemCount numSamples = CMSampleBufferGetNumSamples(sampleBuffer); 

    NSUInteger channelIndex = 0; 

    CMBlockBufferRef audioBlockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); 
    size_t audioBlockBufferOffset = (channelIndex * numSamples * sizeof(SInt16)); 
    size_t lengthAtOffset = 0; 
    size_t totalLength = 0; 
    SInt16 *samples = NULL; 
    CMBlockBufferGetDataPointer(audioBlockBuffer, audioBlockBufferOffset, &lengthAtOffset, &totalLength, (char **)(&samples)); 

    const AudioStreamBasicDescription *audioDescription = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(sampleBuffer)); 

    SwrContext *swr = swr_alloc(); 

    int in_smprt = (int)audioDescription->mSampleRate; 
    av_opt_set_int(swr, "in_channel_layout", AV_CH_LAYOUT_MONO, 0); 

    av_opt_set_int(swr, "out_channel_layout", audio_st->codec->channel_layout, 0); 

    av_opt_set_int(swr, "in_channel_count", audioDescription->mChannelsPerFrame, 0); 
    av_opt_set_int(swr, "out_channel_count", audio_st->codec->channels, 0); 

    av_opt_set_int(swr, "out_channel_layout", audio_st->codec->channel_layout, 0); 
    av_opt_set_int(swr, "in_sample_rate",  audioDescription->mSampleRate,0); 

    av_opt_set_int(swr, "out_sample_rate", audio_st->codec->sample_rate,0); 

    av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_S16, 0); 

    av_opt_set_sample_fmt(swr, "out_sample_fmt", audio_st->codec->sample_fmt, 0); 

    swr_init(swr); 
    uint8_t **input = NULL; 
    int src_linesize; 
    int in_samples = (int)numSamples; 
    ret = av_samples_alloc_array_and_samples(&input, &src_linesize, audioDescription->mChannelsPerFrame, 
              in_samples, AV_SAMPLE_FMT_S16P, 0); 


    *input=(uint8_t*)samples; 
    uint8_t *output=NULL; 


    int out_samples = av_rescale_rnd(swr_get_delay(swr, in_smprt) +in_samples, (int)audio_st->codec->sample_rate, in_smprt, AV_ROUND_UP); 

    av_samples_alloc(&output, NULL, audio_st->codec->channels, out_samples, audio_st->codec->sample_fmt, 0); 
    in_samples = (int)numSamples; 
    out_samples = swr_convert(swr, &output, out_samples, (const uint8_t **)input, in_samples); 


    aframe->nb_samples =(int) out_samples; 


    ret = avcodec_fill_audio_frame(aframe, audio_st->codec->channels, audio_st->codec->sample_fmt, 
          (uint8_t *)output, 
          (int) out_samples * 
          av_get_bytes_per_sample(audio_st->codec->sample_fmt) * 
          audio_st->codec->channels, 1); 

    aframe->channel_layout = audio_st->codec->channel_layout; 
    aframe->channels=audio_st->codec->channels; 
    aframe->sample_rate= audio_st->codec->sample_rate; 

    if (timing_info.presentationTimeStamp.timescale!=0) 
     pts=(double) timing_info.presentationTimeStamp.value/timing_info.presentationTimeStamp.timescale; 

    aframe->pts=pts*audio_st->time_base.den; 
    aframe->pts = av_rescale_q(aframe->pts, audio_st->time_base, audio_st->codec->time_base); 

    ret = avcodec_encode_audio2(c, &pkt, aframe, &got_packet); 

    if (ret < 0) { 
     fprintf(stderr, "Error encoding audio frame: %s\n", av_err2str(ret)); 
     exit(1); 
    } 
    swr_free(&swr); 
    if (got_packet) 
    { 
     pkt.stream_index = audio_st->index; 

     pkt.pts = av_rescale_q(pkt.pts, audio_st->codec->time_base, audio_st->time_base); 
     pkt.dts = av_rescale_q(pkt.dts, audio_st->codec->time_base, audio_st->time_base); 

     // Write the compressed frame to the media file. 
     ret = av_interleaved_write_frame(oc, &pkt); 
     if (ret != 0) { 
      fprintf(stderr, "Error while writing audio frame: %s\n", 
        av_err2str(ret)); 
      exit(1); 
     } 

} 
+0

iPhone的支持比AAC音頻更多,只是好奇,爲什麼你只支持AAC,羅技相機不支持以下任何一種,g711(ulaw),apcm,mpeg2音頻等。大多數相機的我們知道支持至少g711,技術上還有一些額外的api amr也是可以的。 –

+0

AAC編解碼器是我的任務中需要的編解碼器之一,並與其他編解碼器我沒有這樣的問題。 – Aleksei2414904

+0

嗨Aleksei2414904我編碼PCM樣本到aac android並面臨同樣的問題,請幫助我,如果你找到任何解決方案。 –

回答

0

我有類似的問題。我編碼爲PCM數據包到AACPCM數據包的長度有時小於。

如果我編碼小於1024的數據包,音頻將是slow。另一方面,如果我把它扔掉,音頻將更快swr_convert功能從我的觀察沒有任何自動緩衝。

我結束了該數據包被填充到1024緩衝和緩衝得到編碼清洗每次它的全緩衝方案。

填補緩衝功能如下:

// put frame data into buffer of fixed size 
bool ffmpegHelper::putAudioBuffer(const AVFrame *pAvFrameIn, AVFrame **pAvFrameBuffer, AVCodecContext *dec_ctx, int frame_size, int &k0) { 
    // prepare pFrameAudio 
    if (!(*pAvFrameBuffer)) { 
    if (!(*pAvFrameBuffer = av_frame_alloc())) { 
     av_log(NULL, AV_LOG_ERROR, "Alloc frame failed\n"); 
     return false; 
    } else { 
     (*pAvFrameBuffer)->format = dec_ctx->sample_fmt; 
     (*pAvFrameBuffer)->channels = dec_ctx->channels; 
     (*pAvFrameBuffer)->sample_rate = dec_ctx->sample_rate; 
     (*pAvFrameBuffer)->nb_samples = frame_size; 
     int ret = av_frame_get_buffer(*pAvFrameBuffer, 0); 
     if (ret < 0) { 
     char err[500]; 
     av_log(NULL, AV_LOG_ERROR, "get audio buffer failed: %s\n", 
      av_make_error_string(err, AV_ERROR_MAX_STRING_SIZE, ret)); 
     return false; 
     } 
     (*pAvFrameBuffer)->nb_samples = 0; 
     (*pAvFrameBuffer)->pts = pAvFrameIn->pts; 
    } 
    } 

    // copy input data to buffer 
    int n_channels = pAvFrameIn->channels; 
    int new_samples = min(pAvFrameIn->nb_samples - k0, frame_size - (*pAvFrameBuffer)->nb_samples); 
    int k1 = (*pAvFrameBuffer)->nb_samples; 

    if (pAvFrameIn->format == AV_SAMPLE_FMT_S16) { 
    int16_t *d_in = (int16_t *)pAvFrameIn->data[0]; 
    d_in += n_channels * k0; 
    int16_t *d_out = (int16_t *)(*pAvFrameBuffer)->data[0]; 
    d_out += n_channels * k1; 

    for (int i = 0; i < new_samples; ++i) { 
     for (int j = 0; j < pAvFrameIn->channels; ++j) { 
     *d_out++ = *d_in++; 
     } 
    } 
    } else { 
    printf("not handled format for audio buffer\n"); 
    return false; 
    } 

    (*pAvFrameBuffer)->nb_samples += new_samples; 
    k0 += new_samples; 

    return true; 
} 

而對於填充緩衝和編碼循環低於:

// transcoding needed 
int got_frame; 
AVMediaType stream_type; 
// decode the packet (do it your self) 
decodePacket(packet, dec_ctx, &pAvFrame_, got_frame); 

if (enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO) { 
    ret = 0; 
    // break audio packet down to buffer 
    if (enc_ctx->frame_size > 0) { 
     int k = 0; 
     while (k < pAvFrame_->nb_samples) { 
      if (!putAudioBuffer(pAvFrame_, &pFrameAudio_, dec_ctx, enc_ctx->frame_size, k)) 
       return false; 
      if (pFrameAudio_->nb_samples == enc_ctx->frame_size) { 
       // the buffer is full, encode it (do it yourself) 
       ret = encodeFrame(pFrameAudio_, stream_index, got_frame, false); 
       if (ret < 0) 
        return false; 
       pFrameAudio_->pts += enc_ctx->frame_size; 
       pFrameAudio_->nb_samples = 0; 
      } 
     } 
    } else { 
     ret = encodeFrame(pAvFrame_, stream_index, got_frame, false); 
    } 
} else { 
    // encode packet directly 
    ret = encodeFrame(pAvFrame_, stream_index, got_frame, false); 
} 
0

你必須打破樣本緩衝區爲大小爲1024大塊,我做了記錄在Android的MP3爲FO更多信息llow這些鏈接link1links2

0

如果有人在這裏結束了,我有同樣的問題,同樣@Mohit指出了AAC每個音頻幀必須被分解成1024個字節的塊。

例如:

uint8_t *buffer = (uint8_t*) malloc(1024); 
AVFrame *frame = av_frame_alloc(); 
while((fread(buffer, 1024, 1, fp)) == 1) { 
    frame->data[0] = buffer; 
} 
1

我也有類似的問題後,結束了這裏。我正在用Blackmagic Decklink SDI卡以720p50讀取音頻和視頻,這意味着我需要每個視頻幀(48k/50fps)960個樣本,我想與視頻一起編碼。當只發送960個樣本給aacenc時,聽到了非常奇怪的音頻,它也沒有真正抱怨這個事實。

開始使用AVAudioFifo(請參閱ffmpeg/doc/examples/transcode_aac.c)並不斷添加幀,直到我有足夠的幀來滿足aacenc。這將意味着我的樣品玩得太晚,我猜,因爲當第一個960真的應該有另一個值時,將會設置1024個採樣點。但是,就我所能聽到/看到的而言,它並不是很明顯。