2014-02-11 72 views
0

這是HTML文件,我試圖與libxml的libxml的HTML解析錯誤使用C

<html> 
<head> 
     <title>Hello World Page</title> 
     <link rel="stylesheet" type="text/css" href="http://csszengarden.com/214/214.css?v=8may2013"> 
</head> 
<body> 
    <h3>Hello World</h3> 
    <br> 
    <p>Questo e un paragrafo.</p> 
    <a src="/">LINK</a> 
</body> 
</html> 

解析,這是我從libxml的解析教程了示例程序。

#include <stdio.h> 
#include <libxml/parser.h> 
#include <libxml/tree.h> 

static void print_element_names(xmlNode * a_node); 

int main() 
{ 
    xmlDoc   *doc = NULL; 
    xmlNode  *root_element = NULL; 
    const char  *Filename = "file.xml"; 
    doc = xmlReadFile(Filename, NULL, 0); 

    if (doc == NULL) printf("error: could not parse file %s\n", Filename); 
    else 
    { root_element = xmlDocGetRootElement(doc); 
    print_element_names(root_element); 
    xmlFreeDoc(doc); } 
    xmlCleanupParser(); 
    return (0); 
} 

static void print_element_names(xmlNode * a_node) 
{ 
    xmlNode *cur_node = NULL; 
    for (cur_node = a_node; cur_node; cur_node = cur_node->next) { 
     if (cur_node->type == XML_ELEMENT_NODE) 
      printf("node type: Element, name: %s\n", cur_node->name); 
     print_element_names(cur_node->children); 
    } 
} 

回到我這一系列錯誤

file.xml:5: parser error : Opening and ending tag mismatch: link line 4 and head 
    </head> 
     ^
file.xml:11: parser error : Opening and ending tag mismatch: br line 8 and body 
    </body> 
     ^
file.xml:12: parser error : Opening and ending tag mismatch: body line 6 and html 
</html> 
    ^
file.xml:12: parser error : Premature end of data in tag head line 2 
</html> 
    ^
file.xml:12: parser error : Premature end of data in tag html line 1 
</html> 
    ^
error: could not parse file file.xml 

的我的libxml的菜鳥,我會生成並基於HTML文件樹中提取數據。我在程序中修改了哪些HTML代碼?

+0

oops,update :) – user3298273

回答

1

xmlReadFile解析XML文件。你有一個HTML文件,而不是一個XML文件。要解析HTML文件,請使用htmlReadFile代替[1]


  1. 錯誤在documentation說它解析XML不論。