後編輯:
所以,現在唯一的問題是轉換。
所需的編碼可以是(我在維基百科搜索):
- 「ISO-639-1」
- 「ISO-639-2」
- 「ISO-639-3」
或者:
private static String getAsciz(byte[] bytes, int offset, int offset2) {
for (int i = offset; i < offset2; ++i) {
if (bytes[i] == 0) {
offset2 = i;
}
}
final String encoding = "ISO-639-1";
try {
return new String(bytes, offset, offset2 - offset, encoding).trim();
} catch (UnsupportedEncodingException e) {
throw new IllegalStateException("Charset not installed: " + encoding);
}
}
或使用第三方librar時Ÿ也許通過撤消編碼黑客(注意,這可能是一個可變編碼:目前的平臺編碼):
String s = thirdParty.getColumn("NAME");
// Reconstruct the bytes (Windows Latin-1, Western Europe)
byte[] bytes = s.getBytes("Cp1252");
s = new String(bytes, "ISO-639-1");
老答案:
.dbf
是一個二進制格式具有固定長度的記錄。在每個記錄中,字段值都是普通字符數組(最可能是ANSI)。
我的猜測是,您嘗試將文件作爲文本讀取。
或者.dbf文件被加密。用十六進制編輯器查看文件。
您可以將其作爲二進制塊讀取。首先是帶有列定義的標題部分。然後用刪除標記來實際記錄。
由於這是一種舊格式,因此有許多庫。您沒有提及要使用哪種編程語言,但通過使用十六進制轉儲和互聯網中的某些格式信息,您可以輕鬆製作dbf閱讀器。
一個簡單的轉換以製表符分隔文本:
未測試和在Java中,但示出,這是微不足道的。然後,您可以使用Excel或其他方式進行轉換和OLE DB。注意:作爲輸入in
我在這裏使用ISO-8859-1,並且輸出爲out
UTF-8。我還爲UTF-8識別編寫了一個BOM(文件標記開始)。
private static final boolean TEST = true;
private static class FieldDef {
String name;
char type;
int length;
int decimals;
}
public static void main(String[] args) {
File dbfFile = new File("C:/aaa/bbb.dbf");
String csvName = dbfFile.getName().replaceFirst("(?i)\\.dbf$", "") + ".csv";
File csvFile = new File(dbfFile.getParentFile(), csvName);
try (BufferedInputStream in = new BufferedInputStream(new FileInputStream(dbfFile));
PrintWriter out = new PrintWriter(csvFile, "UTF-8")) {
byte[] header = new byte[0x20];
in.read(header);
// Version:
switch (header[0x00]) {
case 0x03:
System.out.println("dBaseIII without Memo");
break;
case -128 + 0x03:
System.out.println("dBaseIII with Memo");
break;
default:
throw new UnsupportedOperationException("dBase Version not 3");
}
int recordCount = getInt(header, 0x04);
int headerSize = getShort(header, 0x08);
int recordSize = getShort(header, 0x0a);
List<FieldDef> fieldDefs = new ArrayList<>();
byte[] fieldDefBytes = new byte[0x20];
int offset = header.length;
out.print("\uFFFE"); // UTF-8 BOM to distinghuish it from Windows ANSI.
out.print("DEL"); // Deletion marker.
while (offset + 1 < headerSize) {
in.read(fieldDefBytes);
FieldDef fieldDef = new FieldDef();
fieldDef.name = getAsciz(fieldDefBytes, 0, 11);
fieldDef.type = (char)fieldDefBytes[11];
// #4 int - field data address.
fieldDef.length = 0xFF & fieldDefBytes[16];
fieldDef.decimals = 0xFF & fieldDefBytes[17];
out.print('\t');
out.print(fieldDef.name);
fieldDefs.add(fieldDef);
System.out.printf("%-11s %c (%d, %d)%b", fieldDef.name,
fieldDef.type, fieldDef.length, fieldDef.decimals);
}
out.println();
int b = in.read();
assert b == 0x0d;
byte[] record = new byte[recordSize];
for (int recno = 0; recno < recordCount; ++recno) {
if (TEST && recno > 100) {
break;
}
in.read(record);
//boolean deleted = (0xFF & record[0]) != 0x20; // == 0x2A '*'
String deletionMark = getAsciz(record, 0, 1);
out.print(deletionMark);
offset = 1;
for (FieldDef fieldDef : fieldDefs) {
out.print('\t');
String fieldValue = getAsciz(record, offset, offset + fieldDef.length);
out.print(fieldValue);
offset += fieldDef.length;
}
out.println();
}
// assert in.read() == 0x1A; // End-of-file byte.
} catch (IOException ex) {
Logger.getLogger(Dbf3ToTsv.class.getName()).log(Level.SEVERE, null, ex);
}
}
private static int getInt(byte[] bytes, int offset) {
int n = 0;
for (int i = 0; i < 4; ++i) {
n = (n << 8) | (0xFF & bytes[offset + 4 - 1 - i]);
}
return n;
}
private static int getShort(byte[] bytes, int offset) {
int n = 0;
for (int i = 0; i < 2; ++i) {
n = (n << 8) | (0xFF & bytes[offset + 2 - 1 - i]);
}
return n;
}
private static String getAsciz(byte[] bytes, int offset, int offset2) {
for (int i = offset; i < offset2; ++i) {
if (bytes[i] == 0) {
offset2 = i;
}
}
return new String(bytes, offset, offset2 - offset, StandardCharsets.ISO_8859_1).trim();
}
是的,我讀它作爲text..how我應該讀文件? – Parid0kht
我使用'C#',我用OleDbConnection來連接到這個數據庫和'OleDbCommand'來查詢數據庫..這種方式對嗎? – Parid0kht
我在十六進制編輯器中看到'.dbf' ..十六進制編輯器的要點是什麼? (對不起,如果它是明確的,但對我來說不是!!) – Parid0kht