將FFT應用於java中的錄音

我在這個網站上看到類似於這個問題的類似問題，但我的問題有點不同。我用來捕捉音頻的代碼是this。我想簡單地採集捕捉到的音頻並用256點對其應用FFT。將FFT應用於java中的錄音

我意識到這個line count = line.read(buffer, 0, buffer.length);將音頻分解爲「塊」。

此外，我使用的FFT可以找到here。

我的問題是：

我想知道是否有到FFT應用到整個錄音不只是一個緩衝量的方式。
我看到FFT的代碼需要一個實部和虛部，我如何從音頻文件的代碼中得到實部和虛部。

來源

2014-01-30 Baker Johnson

你不能 '簡單的' 做到這一點。您*可以*通過將字節轉換爲音頻採樣（手動，原樣）來複雜地完成此操作。你看了很多嗎？ – Radiodef

我的一位講師建議我這樣做，所以我沒有按照你的建議去做。有沒有更簡單的方法來做到這一點，因爲我真的需要256點的FFT。我一定會閱讀你所建議的方法。 –

我說的基本上是用Java來完成的唯一方法。 Java聲音所能做的就是讀入原始字節並將它們寫入輸出。這是完全可能的，然後「攔截」流，自己轉換它們，並隨他們做你想做的。 – Radiodef

所有的javax.sound.sampled包都是從文件讀取原始字節並將它們寫入輸出。所以你需要做一個「在兩者之間」的步驟，這個步驟是你自己轉換樣品的。

下面介紹如何做到這一點（有註釋）PCM，從我的代碼示例WaveformDemo採取：

public static float[] unpack(
    byte[] bytes, 
    long[] transfer, 
    float[] samples, 
    int bvalid, 
    AudioFormat fmt 
) { 
    if(fmt.getEncoding() != AudioFormat.Encoding.PCM_SIGNED 
      && fmt.getEncoding() != AudioFormat.Encoding.PCM_UNSIGNED) { 

     return samples; 
    } 

    final int bitsPerSample = fmt.getSampleSizeInBits(); 
    final int bytesPerSample = bitsPerSample/8; 
    final int normalBytes = normalBytesFromBits(bitsPerSample); 

    /* 
    * not the most DRY way to do this but it's a bit more efficient. 
    * otherwise there would either have to be 4 separate methods for 
    * each combination of endianness/signedness or do it all in one 
    * loop and check the format for each sample. 
    * 
    * a helper array (transfer) allows the logic to be split up 
    * but without being too repetetive. 
    * 
    * here there are two loops converting bytes to raw long samples. 
    * integral primitives in Java get sign extended when they are 
    * promoted to a larger type so the & 0xffL mask keeps them intact. 
    * 
    */ 

    if(fmt.isBigEndian()) { 
     for(int i = 0, k = 0, b; i < bvalid; i += normalBytes, k++) { 
      transfer[k] = 0L; 

      int least = i + normalBytes - 1; 
      for(b = 0; b < normalBytes; b++) { 
       transfer[k] |= (bytes[least - b] & 0xffL) << (8 * b); 
      } 
     } 
    } else { 
     for(int i = 0, k = 0, b; i < bvalid; i += normalBytes, k++) { 
      transfer[k] = 0L; 

      for(b = 0; b < normalBytes; b++) { 
       transfer[k] |= (bytes[i + b] & 0xffL) << (8 * b); 
      } 
     } 
    } 

    final long fullScale = (long)Math.pow(2.0, bitsPerSample - 1); 

    /* 
    * the OR is not quite enough to convert, 
    * the signage needs to be corrected. 
    * 
    */ 

    if(fmt.getEncoding() == AudioFormat.Encoding.PCM_SIGNED) { 

     /* 
     * if the samples were signed, they must be 
     * extended to the 64-bit long. 
     * 
     * so first check if the sign bit was set 
     * and if so, extend it. 
     * 
     * as an example, imagining these were 4-bit samples originally 
     * and the destination is 8-bit, a mask can be constructed 
     * with -1 (all bits 1) and a left shift: 
     * 
     *  11111111 
     * << (4 - 1) 
     * =========== 
     *  11111000 
     * 
     * (except the destination is 64-bit and the original 
     * bit depth from the file could be anything.) 
     * 
     * then supposing we have a hypothetical sample -5 
     * that ought to be negative, an AND can be used to check it: 
     * 
     * 00001011 
     * & 11111000 
     * ========== 
     * 00001000 
     * 
     * and an OR can be used to extend it: 
     * 
     * 00001011 
     * | 11111000 
     * ========== 
     * 11111011 
     * 
     */ 

     final long signMask = -1L << bitsPerSample - 1L; 

     for(int i = 0; i < transfer.length; i++) { 
      if((transfer[i] & signMask) != 0L) { 
       transfer[i] |= signMask; 
      } 
     } 
    } else { 

     /* 
     * unsigned samples are easier since they 
     * will be read correctly in to the long. 
     * 
     * so just sign them: 
     * subtract 2^(bits - 1) so the center is 0. 
     * 
     */ 

     for(int i = 0; i < transfer.length; i++) { 
      transfer[i] -= fullScale; 
     } 
    } 

    /* finally normalize to range of -1.0f to 1.0f */ 

    for(int i = 0; i < transfer.length; i++) { 
     samples[i] = (float)transfer[i]/(float)fullScale; 
    } 

    return samples; 
} 

public static int normalBytesFromBits(int bitsPerSample) { 

    /* 
    * some formats allow for bit depths in non-multiples of 8. 
    * they will, however, typically pad so the samples are stored 
    * that way. AIFF is one of these formats. 
    * 
    * so the expression: 
    * 
    * bitsPerSample + 7 >> 3 
    * 
    * computes a division of 8 rounding up (for positive numbers). 
    * 
    * this is basically equivalent to: 
    * 
    * (int)Math.ceil(bitsPerSample/8.0) 
    * 
    */ 

    return bitsPerSample + 7 >> 3; 
}

這段代碼假定float[]和您的FFT想要一個double[]但是這是一個相當簡單的變化。 transfer和samples是長度等於bytes.length * normalBytes的數組，而bvalid是來自read的返回值。我的代碼示例假定AudioInputStream，但相同的轉換應該適用於TargetDataLine。我不確定你可以從字面上複製和粘貼它，但它是一個例子。

關於你提到的兩個問題：

您可以在整個記錄很長的FFT或從每個緩衝器平均的FFT的。
您鏈接的FFT計算到位。所以實部是音頻樣本，而虛部是一個長度等於實部的空數組（填充零）。

但是，當FFT完成後仍然有你需要做的，我沒有看到鏈接的類做了兩件事情：

轉換到極座標。
通常丟棄負頻率（整個上半部分是下半部分的鏡像）。
通過將結果大小（實部）除以變換的長度來對潛在的大小進行縮放。

編輯，相關：

How do I use audio sample data from Java Sound?

來源

2014-01-31 00:39:39 Radiodef

非常感謝您的回覆，這是相當徹底的！正是我想要的。我只是想知道是否有任何材料可以閱讀，以瞭解如何轉換爲極地coordiantes –

不客氣。對於極座標，我想，http://www.dspguide.com/ch8/8.htm（公式8-6）。只需製作一個計算方程的小方法即可。 – Radiodef

再次感謝你，你已經幫了我很多。 –

將FFT應用於java中的錄音

回答

相關問題