從Java中的二進制文件中讀取字符串

我已經閱讀了我在網上找到的任何頁面，但其中沒有一個適用於我。從Java中的二進制文件中讀取字符串

我有一個用C代碼創建的二進制文件。我也有這個二進制文件的C閱讀器。我需要爲這個二進制文件編寫java reader。

在C代碼中，以下命令讀取一個大小爲'b * max_w'的字符串和一個字符。

fscanf(f, "%s%c", &vocab[b * max_w], &ch);

在java中我讀的二進制文件，

FileInputStream fis = new FileInputStream(filename); 
BufferedInputStream bin = new BufferedInputStream(fis);

，然後讀取字節並將其轉換成字符串。

for(int j = 0; j < 200; j++) { 
    int size = 2; // char is 2 bytes 
    byte[] tempId3 = new byte[size]; 
    bin.read(tempId3, 0, size); 
    String id3 = new String (tempId3); 
    System.out.println(" id = " + id3);     
}

但是輸出是一堆廢話。難道我做錯了什麼？我可以做得更好嗎？

編輯：從here從該運行的C片段是：

fscanf(f, "%lld", &words); 
    fscanf(f, "%lld", &size); 
    vocab = (char *)malloc((long long)words * max_w * sizeof(char)); 
    for (a = 0; a < N; a++) bestw[a] = (char *)malloc(max_size * sizeof(char));

以下是我有：

FileInputStream fis = new FileInputStream(filename); 
BufferedInputStream bin = new BufferedInputStream(fis); 

int length = 1; 

System.out.println("1st: "); 
byte[] tempId = new byte[8]; 
bin.read(tempId, 0, 8); 
String id = new String (tempId, "US-ASCII"); 
System.out.println(" out = " + id); 

System.out.println("2nd: "); 
int size1 = 8; 
byte[] tempId2 = new byte[size1]; 
bin.read(tempId2, 0, size1); 
String id2 = new String (tempId2, "US-ASCII"); 
System.out.println(" out = " + id2); 



for(int j = 0; j < 20; j++) { 
    int size = 2; 
    byte[] tempId3 = new byte[size]; 
    bin.read(tempId3, 0, size); 
    String id3 = new String (tempId3, "US-ASCII"); 
    System.out.println(" out = " + id3);     
}

，我看到的是下面的輸出;除了前兩個'長'數字，其餘都是無稽之談（預計會是字符）。

output

PS。 C代碼是here（第44-60行是讀取二進制文件的部分）

來源

2014-01-27 Daniel

'new String（byte []）'構造函數使用系統的默認字符集進行解碼。這可能是某種UTF-8，但它可能不是。嘗試'System.out.println（System.getProperty（「file.encoding」））;'找出它的設置。我很確定C使用ASCII作爲字符（這將與UTF-8兼容），但我不是C程序員。另外，張貼一些廢話。培訓過的眼睛可能不是無稽之談。 ; ） – Radiodef

可能使用Reader您可以得到您需要的內容嗎？使用InputStream讀取二進制數據，讀取器用於字符串。

來源

2014-01-27 12:57:56

您可以嘗試使用像this one這樣的構造函數，並嘗試使用不同的字符集。因爲一個java字符串以UTF-16編碼，所以一個字符以2個字節編碼，這可能是爲什麼它不起作用。嘗試使用US-ASCII。

來源

2014-01-27 12:59:55 NitroG42

字符串在Java中是unicode。你必須照顧這一點。您在二進制文件中使用的編碼是什麼？

來源

2014-01-27 13:01:42

我不知道！（因此，如果二進制文件中的編碼不是在C的unicode中，我將無法讀取它？C的默認編碼是什麼？ – Daniel

String id3 = new String(tempId3, "US-ASCII");

來源

2014-01-27 13:04:02 MariuszS

沒有幫助，我在上面添加了一些輸出。 – Daniel

正如在其他評論中所說的那樣，嘗試使用帶字符編碼的String構造函數。那就是：

String id3 = new String(tempId3, Charsets.US_ASCII);

或者：

String id3 = new String(tempId3, "US_ASCII");

，其他線路可能會保持不變。

在您發佈的C代碼中沒有實際的字符讀數。只有內存分配用於進一步的掃描過程。

來源

2015-03-04 13:37:30 Dmitry

從Java中的二進制文件中讀取字符串

回答

相關問題