我正在嘗試讀取分隔文件並解析其內容。與CSV不同,分隔符,字符串限定符等是非ASCII的ie。 U0014和U00FE。但是,我無法檢測到字符串限定符(FE)。這是因爲角色的價值是更大還是其他?識別並匹配文件中的非ASCII字符
下面是一個說明核心問題的簡單程序。我該如何做這項工作?這是一個非常小的測試文件的鏈接。 https://www.dropbox.com/s/1cilircwc3pq78c/nonascii.dat?dl=0
感謝
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.LineIterator;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.io.Reader;
public class CharMatch {
public static void main(String[] args)
throws Exception {
final String pathname = "/home/vinayb/Downloads/nonascii.dat";
final File file = new File(pathname);
final String encoding = "UTF-8";
final PrintStream out = new PrintStream(System.out, true, encoding);
final Reader r = new BufferedReader(new InputStreamReader(
new FileInputStream(file), encoding));
final LineIterator it = FileUtils.lineIterator(file, encoding);
try {
//read a line
final String line = it.nextLine();
final char[] chars = line.toCharArray();
for (char c : chars) {
out.println(c + " , with decimal value: " + Character.getNumericValue(c) + " and hexa value: " + Integer.toHexString(Character.getNumericValue(c)));
}
out.println("------------------------------------");
final String expectedDelimiter = fromUnicode("0014");
final String expectedStringQualifier = fromUnicode("00FE");
out.println("##### expected delimiter:" + expectedDelimiter);
out.println("##### expected string qualifier:" + expectedStringQualifier);
String[] items = line.split(expectedDelimiter);
out.println("#### " + items.length + " " + items[0]);
if (line.contains(expectedDelimiter)) {
out.println("Found delimiter"); ////=======> can match this
}
if (line.contains(expectedStringQualifier)) {
out.println("Found string qualifier"); //=======> can't match this
}
} finally {
LineIterator.closeQuietly(it);
}
}
private static String fromUnicode(String codePoint) {
return "" + (char) Integer.parseInt(codePoint, 16);
}
}
「string qualifier character」?那應該是什麼? – fge 2015-04-01 21:35:01
這是一個用來限定字符串的字符。一個常用的分隔符是「。例如在csv中,我們將使用分隔符,因此''John Doe」,「123,Main Street」'。在這種情況下,分隔符是00FE。請看這個鏈接看起來像什麼樣的http ://en.wikipedia.org/wiki/ISO/IEC_8859-1 – 2015-04-01 21:40:57