從.dbf文件讀取編碼的字符串

有一個.dbf文件，我想讀取它的數據但字符串不可讀。我不知道字符串的編碼！我能找到它嗎？是否可以在.dbf文件中獲得字符串的編碼？是否有可能從.dbf文件中獲取不可讀的字符串？是否有可能得到一個不可讀的ANSI編碼的字符串？因爲字符串是不可讀的，這是否意味着它以某種方式編碼？從.dbf文件讀取編碼的字符串

編輯：

下面的代碼是我如何連接到.dbf文件並讀取它

using (OleDbConnection con = new OleDbConnection(constr)) 
    { 

     var sql = 
      "select name, family, account, is_no, code, bdate, is_pl, father from CP where account like '%23854%' "; 

     OleDbCommand cmd = new OleDbCommand(sql, con); 
     con.Open(); 
     DataSet ds = new DataSet(); 
     OleDbDataAdapter da = new OleDbDataAdapter(cmd); 
     da.Fill(ds); 

     var dt = ds.Tables[0]; 
     foreach (DataRow row in dt.Rows) 
     { 
      var account = row["account"]; 
     } 
    }

，並返回23854æ∞ì的帳戶。

EDIT2：

我使用了一些第三方找到我的.dbf文件的一些信息，以下圖片顯示

enter image description here

EDIT3：

這裏是數據截圖DBF Commander Pro

個

不可讀的字符是Arabic/Persian

enter image description here

來源

2014-01-12 Parid0kht

後編輯：

所以，現在唯一的問題是轉換。

所需的編碼可以是（我在維基百科搜索）：

「ISO-639-1」
「ISO-639-2」
「ISO-639-3」

或者：

private static String getAsciz(byte[] bytes, int offset, int offset2) { 
    for (int i = offset; i < offset2; ++i) { 
     if (bytes[i] == 0) { 
      offset2 = i; 
     } 
    } 
    final String encoding = "ISO-639-1"; 
    try { 
     return new String(bytes, offset, offset2 - offset, encoding).trim(); 
    } catch (UnsupportedEncodingException e) { 
     throw new IllegalStateException("Charset not installed: " + encoding); 
    } 
}

或使用第三方librar時Ÿ也許通過撤消編碼黑客（注意，這可能是一個可變編碼：目前的平臺編碼）：

String s = thirdParty.getColumn("NAME"); 

// Reconstruct the bytes (Windows Latin-1, Western Europe) 
byte[] bytes = s.getBytes("Cp1252"); 

s = new String(bytes, "ISO-639-1");

老答案：

.dbf是一個二進制格式具有固定長度的記錄。在每個記錄中，字段值都是普通字符數組（最可能是ANSI）。

我的猜測是，您嘗試將文件作爲文本讀取。

或者.dbf文件被加密。用十六進制編輯器查看文件。

您可以將其作爲二進制塊讀取。首先是帶有列定義的標題部分。然後用刪除標記來實際記錄。

由於這是一種舊格式，因此有許多庫。您沒有提及要使用哪種編程語言，但通過使用十六進制轉儲和互聯網中的某些格式信息，您可以輕鬆製作dbf閱讀器。

一個簡單的轉換以製表符分隔文本：

未測試和在Java中，但示出，這是微不足道的。然後，您可以使用Excel或其他方式進行轉換和OLE DB。注意：作爲輸入in我在這裏使用ISO-8859-1，並且輸出爲out UTF-8。我還爲UTF-8識別編寫了一個BOM（文件標記開始）。

private static final boolean TEST = true; 

private static class FieldDef { 
    String name; 
    char type; 
    int length; 
    int decimals; 
} 

public static void main(String[] args) { 
    File dbfFile = new File("C:/aaa/bbb.dbf"); 
    String csvName = dbfFile.getName().replaceFirst("(?i)\\.dbf$", "") + ".csv"; 
    File csvFile = new File(dbfFile.getParentFile(), csvName); 
    try (BufferedInputStream in = new BufferedInputStream(new FileInputStream(dbfFile)); 
      PrintWriter out = new PrintWriter(csvFile, "UTF-8")) { 
     byte[] header = new byte[0x20]; 
     in.read(header); 

     // Version: 
     switch (header[0x00]) { 
      case 0x03: 
       System.out.println("dBaseIII without Memo"); 
       break; 
      case -128 + 0x03: 
       System.out.println("dBaseIII with Memo"); 
       break; 
      default: 
       throw new UnsupportedOperationException("dBase Version not 3"); 
     } 

     int recordCount = getInt(header, 0x04); 
     int headerSize = getShort(header, 0x08); 
     int recordSize = getShort(header, 0x0a); 

     List<FieldDef> fieldDefs = new ArrayList<>(); 
     byte[] fieldDefBytes = new byte[0x20]; 
     int offset = header.length; 
     out.print("\uFFFE"); // UTF-8 BOM to distinghuish it from Windows ANSI. 
     out.print("DEL"); // Deletion marker. 
     while (offset + 1 < headerSize) { 
      in.read(fieldDefBytes); 
      FieldDef fieldDef = new FieldDef(); 
      fieldDef.name = getAsciz(fieldDefBytes, 0, 11); 
      fieldDef.type = (char)fieldDefBytes[11]; 
      // #4 int - field data address. 
      fieldDef.length = 0xFF & fieldDefBytes[16]; 
      fieldDef.decimals = 0xFF & fieldDefBytes[17]; 
      out.print('\t'); 
      out.print(fieldDef.name); 
      fieldDefs.add(fieldDef); 
      System.out.printf("%-11s %c (%d, %d)%b", fieldDef.name, 
        fieldDef.type, fieldDef.length, fieldDef.decimals); 
     } 
     out.println(); 
     int b = in.read(); 
     assert b == 0x0d; 

     byte[] record = new byte[recordSize]; 
     for (int recno = 0; recno < recordCount; ++recno) { 
      if (TEST && recno > 100) { 
       break; 
      } 
      in.read(record); 
      //boolean deleted = (0xFF & record[0]) != 0x20; // == 0x2A '*' 
      String deletionMark = getAsciz(record, 0, 1); 
      out.print(deletionMark); 
      offset = 1; 
      for (FieldDef fieldDef : fieldDefs) { 
       out.print('\t'); 
       String fieldValue = getAsciz(record, offset, offset + fieldDef.length); 
       out.print(fieldValue); 
       offset += fieldDef.length; 
      } 
      out.println(); 
     } 
     // assert in.read() == 0x1A; // End-of-file byte. 
    } catch (IOException ex) { 
     Logger.getLogger(Dbf3ToTsv.class.getName()).log(Level.SEVERE, null, ex); 
    } 
} 

private static int getInt(byte[] bytes, int offset) { 
    int n = 0; 
    for (int i = 0; i < 4; ++i) { 
     n = (n << 8) | (0xFF & bytes[offset + 4 - 1 - i]); 
    } 
    return n; 
} 

private static int getShort(byte[] bytes, int offset) { 
    int n = 0; 
    for (int i = 0; i < 2; ++i) { 
     n = (n << 8) | (0xFF & bytes[offset + 2 - 1 - i]); 
    } 
    return n; 
} 

private static String getAsciz(byte[] bytes, int offset, int offset2) { 
    for (int i = offset; i < offset2; ++i) { 
     if (bytes[i] == 0) { 
      offset2 = i; 
     } 
    } 
    return new String(bytes, offset, offset2 - offset, StandardCharsets.ISO_8859_1).trim(); 
}

來源

2014-01-12 13:03:17

是的，我讀它作爲text..how我應該讀文件？ – Parid0kht

我使用'C＃'，我用OleDbConnection來連接到這個數據庫和'OleDbCommand'來查詢數據庫..這種方式對嗎？ – Parid0kht

我在十六進制編輯器中看到'.dbf' ..十六進制編輯器的要點是什麼？（對不起，如果它是明確的，但對我來說不是!!） – Parid0kht

A .dbf文件是混合二進制和編碼文本文件格式。通過encoded我的意思不是加密，我的意思是取決於所使用的.dbf文件（如CP1252（視窗英文）或CP1251（西裏爾字母）的語言編碼成代碼頁。

如果你想程序訪問和控制，然後您需要可以編寫自己的庫中，或使用許多已經在那裏的一個。

如果你正確地使用圖書館和仍然得到廢話，可能是加密的，或文件可能已損壞。

來源

2014-01-12 16:29:06

我敢肯定，沒有任何腐敗..正如我在我上面的評論中說的，我使用OleDbCommand和OleDbDataAdapter和Dataset ..但它返回不可讀數據，我不知道加密和如何可以我處理這個!!例如，如果我解密數據，那麼它應該工作？我可以查詢文件，並將其作爲文本讀取。 – Parid0kht

如果您希望我自己檢查dbf文件，請隨時與我私下聯繫。通過「加密」，我們的意思是它已被密碼保護，並且使用該密碼對內容進行了加擾。 –

謝謝，這真的很棒..我真的不知道如何讀取文件，但壓縮文件的大小是187M。 – Parid0kht

嘗試使用DBF Commander Pro打開文件，它支持dBase 3.如果文件將被打開，請分享表格的屏幕截圖。在你的文件中編碼錯誤，你只需要設置正確的字符集標誌。如果是，請單擊工具 - >設置代碼頁，然後從列表中選擇適當的編碼。

來源

2014-01-13 17:43:55 Oleg

謝謝，我剛剛添加了屏幕截圖，並且我嘗試了所有可用的軟件代碼頁，但所有代碼都顯示無法讀取的數據。 – Parid0kht

我應該提到這些不可讀的字符是'阿拉伯/波斯'（不知道究竟是哪一個）。 – Parid0kht

好吧，它看起來沒有編碼數據，因爲唯一的字符是不可讀的，而數字是正確的。我仍然認爲字符集有一些問題。您可以請執行以下SQL查詢：'SELECT * FROM table_name INTO DBF「D：\ table1.dbf」，然後將結果DBF文件發送給我們以支持[a] elphsoft.com。我們會嘗試用charsets「玩」。 – Oleg

您的數據看起來像Iran System編碼數據。這是一個非常特殊的編碼，它在伊朗的早期DOS日（FoxPro日！）中使用。你可以找到一個C＃轉換器，在這裏：https://github.com/mohsen-d/IranSystemConvertor

More info（波斯語）

來源

2014-08-11 08:32:57 VahidN

謝謝，但它不適合我！有一些 '？'在每次轉換中都會有字符。 – Parid0kht

那它工作與否？或者它只是部分工作？如果它部分起作用，請嘗試使用此連接字符串的'OLE DB Provider for Visual FoxPro 9.0' http://www.microsoft.com/en-us/download/details.aspx?id=14839：'var connectionString =「Provider = VFPOLEDB.1; Data Source = D：\ path \ rep.dbf; Password =;整理順序= MACHINE「;' – VahidN

從.dbf文件讀取編碼的字符串

回答

相關問題