我有一些XML我想用C中的Expat進行處理.XML可以用Java解析,所以我沒有理由相信它是畸形的。此外,我所擁有的C代碼將解析我手動插入的字符串 - 但它無法解析我的XML文件。Expat(C) - 「無效令牌」(幾乎)每行
這是代碼(用的東西,我已經添加了 - 如果上帝想讓我們用調試器,他不會給我們的printf):
static void XMLCALL
starthandler(void *data, const XML_Char *name, const XML_Char **attr)
{
int i;
if (strcmp(name, "file") == 0) {
for (i = 0; attr[i]; i += 2) {
if (strcmp(attr[i], "path") == 0) {
printf("File is at %s\n", attr[i + 1]);
}
}
}
}
int main(int argc, char *argv[])
{
FILE* inXML;
ssize_t read;
char* line;
size_t len = 0;
XML_Parser p_ctrl = XML_ParserCreate("UTF-8");
if (!p_ctrl) {
fprintf(stderr, "Could not create parser\n");
exit(-1);
}
XML_SetStartElementHandler(p_ctrl, starthandler);
inXML = fopen(argv[1], "r");
if (inXML == NULL) {
fprintf(stderr, "Could not open %s\n", argv[1]);
XML_ParserFree(p_ctrl);
exit(-1);
}
while ((read = getline(&line, &len, inXML)) != -1) {
printf("Line is %s", line);
enum XML_Status status = XML_Parse(p_ctrl, line, len, 0);
if (status == 0) {
enum XML_Error errcde = XML_GetErrorCode(p_ctrl);
printf("ERROR: %s\n", XML_ErrorString(errcde));
printf("Error at column number %lu\n", XML_GetCurrentColumnNumber(p_ctrl));
printf("Error at line number %lu\n", XML_GetCurrentLineNumber(p_ctrl));
}
free(line);
line = NULL;
len = 0;
}
XML_ParserFree(p_ctrl);
fclose(inXML);
return 0;
}
這是我試圖解析XML文件:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE threadrecordml [
<!ELEMENT threadrecordml (file)*>
<!ATTLIST threadrecordml version CDATA #FIXED "0.1">
<!ATTLIST threadrecordml xmlns CDATA #FIXED "http://cartesianproduct.wordpress.com">
<!ELEMENT file EMPTY>
<!ATTLIST file thread CDATA #REQUIRED>
<!ATTLIST file path CDATA #REQUIRED>
]>
<threadrecordml xmlns="http://cartesianproduct.wordpress.com">
<file thread="1" path="tester_1.xml" />
<file thread="3" path="tester_3.xml" />
<file thread="2" path="tester_2.xml" />
<file thread="4" path="tester_4.xml" />
<file thread="5" path="tester_5.xml" />
<file thread="6" path="tester_6.xml" />
<file thread="7" path="tester_7.xml" />
<file thread="8" path="tester_8.xml" />
<file thread="9" path="tester_9.xml" />
<file thread="10" path="tester_10.xml" />
<file thread="11" path="tester_11.xml" />
<file thread="12" path="tester_12.xml" />
<file thread="13" path="tester_13.xml" />
<file thread="14" path="tester_14.xml" />
<file thread="15" path="tester_15.xml" />
<file thread="16" path="tester_16.xml" />
<file thread="17" path="tester_17.xml" />
<file thread="18" path="tester_18.xml" />
</threadrecordml>
這是輸出的樣品...
[email protected]:/n/staffstore/adrianm/optGenC$ ./optgenc ../tester_control.xml
Line is <?xml version="1.0" encoding="UTF-8" standalone="no"?>
ERROR: not well-formed (invalid token)
Error at column number 0
Error at line number 2
Line is <!DOCTYPE threadrecordml [
ERROR: not well-formed (invalid token)
Error at column number 0
Error at line number 3
Line is <!ELEMENT threadrecordml (file)*>
ERROR: not well-formed (invalid token)
Error at column number 0
Error at line number 4
Line is <!ATTLIST threadrecordml version CDATA #FIXED "0.1">
ERROR: not well-formed (invalid token)
Error at column number 0
(對於所有的行)
如果我「欺騙」,並在讀取後添加此行...
line = "<file thread=\"1\" path=\"tester.xml\" />";
該生產線將被解析(當然的代碼,然後打破其他原因)。
因此,從磁盤文件中讀取似乎會發生一些變化......這是否被讀爲16位?但將解析器的編碼更改爲NULL或UTF-16似乎沒有任何區別。
任何人都可以提供解釋嗎? (如果它有什麼區別,我已經在64位OSX和Linux機器上運行這個代碼,並且有同樣的問題)
是否行開始換行,因爲你得到你的第一個錯誤第2行的'xml'標籤?否則,在文件開始處可能會有其他意外字符。 –
好點 - 我沒有注意到它是從第2行開始的。 – adrianmcmenamin
在十六進制編輯器中查看文件顯示沒有任何雜散字符 - 每行都以\ x0A結尾,就是這樣。 – adrianmcmenamin