2014-01-30 51 views
2

我在這個網站上看到類似於這個問題的類似問題,但我的問題有點不同。我用來捕捉音頻的代碼是this。我想簡單地採集捕捉到的音頻並用256點對其應用FFT。將FFT應用於java中的錄音

我意識到這個line count = line.read(buffer, 0, buffer.length);將音頻分解爲「塊」。

此外,我使用的FFT可以找到here

我的問題是:

  1. 我想知道是否有到FFT應用到整個錄音不只是一個緩衝量的方式。
  2. 我看到FFT的代碼需要一個實部和虛部,我如何從音頻文件的代碼中得到實部和虛部。
+0

你不能 '簡單的' 做到這一點。您*可以*通過將字節轉換爲音頻採樣(手動,原樣)來複雜地完成此操作。你看了很多嗎? – Radiodef

+0

我的一位講師建議我這樣做,所以我沒有按照你的建議去做。有沒有更簡單的方法來做到這一點,因爲我真的需要256點的FFT。我一定會閱讀你所建議的方法。 –

+0

我說的基本上是用Java來完成的唯一方法。 Java聲音所能做的就是讀入原始字節並將它們寫入輸出。這是完全可能的,然後「攔截」流,自己轉換它們,並隨他們做你想做的。 – Radiodef

回答

6

所有的javax.sound.sampled包都是從文件讀取原始字節並將它們寫入輸出。所以你需要做一個「在兩者之間」的步驟,這個步驟是你自己轉換樣品的。

下面介紹如何做到這一點(有註釋)PCM,從我的代碼示例WaveformDemo採取:

public static float[] unpack(
    byte[] bytes, 
    long[] transfer, 
    float[] samples, 
    int bvalid, 
    AudioFormat fmt 
) { 
    if(fmt.getEncoding() != AudioFormat.Encoding.PCM_SIGNED 
      && fmt.getEncoding() != AudioFormat.Encoding.PCM_UNSIGNED) { 

     return samples; 
    } 

    final int bitsPerSample = fmt.getSampleSizeInBits(); 
    final int bytesPerSample = bitsPerSample/8; 
    final int normalBytes = normalBytesFromBits(bitsPerSample); 

    /* 
    * not the most DRY way to do this but it's a bit more efficient. 
    * otherwise there would either have to be 4 separate methods for 
    * each combination of endianness/signedness or do it all in one 
    * loop and check the format for each sample. 
    * 
    * a helper array (transfer) allows the logic to be split up 
    * but without being too repetetive. 
    * 
    * here there are two loops converting bytes to raw long samples. 
    * integral primitives in Java get sign extended when they are 
    * promoted to a larger type so the & 0xffL mask keeps them intact. 
    * 
    */ 

    if(fmt.isBigEndian()) { 
     for(int i = 0, k = 0, b; i < bvalid; i += normalBytes, k++) { 
      transfer[k] = 0L; 

      int least = i + normalBytes - 1; 
      for(b = 0; b < normalBytes; b++) { 
       transfer[k] |= (bytes[least - b] & 0xffL) << (8 * b); 
      } 
     } 
    } else { 
     for(int i = 0, k = 0, b; i < bvalid; i += normalBytes, k++) { 
      transfer[k] = 0L; 

      for(b = 0; b < normalBytes; b++) { 
       transfer[k] |= (bytes[i + b] & 0xffL) << (8 * b); 
      } 
     } 
    } 

    final long fullScale = (long)Math.pow(2.0, bitsPerSample - 1); 

    /* 
    * the OR is not quite enough to convert, 
    * the signage needs to be corrected. 
    * 
    */ 

    if(fmt.getEncoding() == AudioFormat.Encoding.PCM_SIGNED) { 

     /* 
     * if the samples were signed, they must be 
     * extended to the 64-bit long. 
     * 
     * so first check if the sign bit was set 
     * and if so, extend it. 
     * 
     * as an example, imagining these were 4-bit samples originally 
     * and the destination is 8-bit, a mask can be constructed 
     * with -1 (all bits 1) and a left shift: 
     * 
     *  11111111 
     * << (4 - 1) 
     * =========== 
     *  11111000 
     * 
     * (except the destination is 64-bit and the original 
     * bit depth from the file could be anything.) 
     * 
     * then supposing we have a hypothetical sample -5 
     * that ought to be negative, an AND can be used to check it: 
     * 
     * 00001011 
     * & 11111000 
     * ========== 
     * 00001000 
     * 
     * and an OR can be used to extend it: 
     * 
     * 00001011 
     * | 11111000 
     * ========== 
     * 11111011 
     * 
     */ 

     final long signMask = -1L << bitsPerSample - 1L; 

     for(int i = 0; i < transfer.length; i++) { 
      if((transfer[i] & signMask) != 0L) { 
       transfer[i] |= signMask; 
      } 
     } 
    } else { 

     /* 
     * unsigned samples are easier since they 
     * will be read correctly in to the long. 
     * 
     * so just sign them: 
     * subtract 2^(bits - 1) so the center is 0. 
     * 
     */ 

     for(int i = 0; i < transfer.length; i++) { 
      transfer[i] -= fullScale; 
     } 
    } 

    /* finally normalize to range of -1.0f to 1.0f */ 

    for(int i = 0; i < transfer.length; i++) { 
     samples[i] = (float)transfer[i]/(float)fullScale; 
    } 

    return samples; 
} 

public static int normalBytesFromBits(int bitsPerSample) { 

    /* 
    * some formats allow for bit depths in non-multiples of 8. 
    * they will, however, typically pad so the samples are stored 
    * that way. AIFF is one of these formats. 
    * 
    * so the expression: 
    * 
    * bitsPerSample + 7 >> 3 
    * 
    * computes a division of 8 rounding up (for positive numbers). 
    * 
    * this is basically equivalent to: 
    * 
    * (int)Math.ceil(bitsPerSample/8.0) 
    * 
    */ 

    return bitsPerSample + 7 >> 3; 
} 

這段代碼假定float[]和您的FFT想要一個double[]但是這是一個相當簡單的變化。 transfersamples是長度等於bytes.length * normalBytes的數組,而bvalid是來自read的返回值。我的代碼示例假定AudioInputStream,但相同的轉換應該適用於TargetDataLine。我不確定你可以從字面上複製和粘貼它,但它是一個例子。

關於你提到的兩個問題:

  1. 您可以在整個記錄很長的FFT或從每個緩衝器平均的FFT的。
  2. 您鏈接的FFT計算到位。所以實部是音頻樣本,而虛部是一個長度等於實部的空數組(填充零)。

但是,當FFT完成後仍然有你需要做的,我沒有看到鏈接的類做了兩件事情:

  • 轉換到極座標。
  • 通常丟棄負頻率(整個上半部分是下半部分的鏡像)。
  • 通過將結果大小(實部)除以變換的長度來對潛在的大小進行縮放。

編輯,相關:

+0

非常感謝您的回覆,這是相當徹底的!正是我想要的。我只是想知道是否有任何材料可以閱讀,以瞭解如何轉換爲極地coordiantes –

+0

不客氣。對於極座標,我想,http://www.dspguide.com/ch8/8.htm(公式8-6)。只需製作一個計算方程的小方法即可。 – Radiodef

+0

再次感謝你,你已經幫了我很多。 –