2017-09-06 208 views
2

我颳了一個網站,昨天要求登錄,頁面是xml格式,如下所示。由於某些教師屬於兩個部門,所以我解決了這個問題,而且我不需要前三行,因此我只能成功登錄。我需要把它變成一個數據幀(或列表,JSON格式)從xml中提取信息

我的代碼:

ID <- xpathApply(xml, "//teacher[@id]") 
ID_unlist <- unlist(ID) 
matrix <- as.data.frame(matrix(ID_unlist),nrow= 2, byrow=TRUE) 

Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : 
    first argument must be atomic 

XML:

<result status="success"> 
    <code>1</code> 
    <note>success</note> 
    <teacherList> 
    <teacher id="D95"> 
     <name>Mary</name> 
     <department id="420"> 
     <name>Math</name> 
     </department> 
     <department id="421"> 
     <name>Statistics</name> 
     </department> 
    </teacher> 
    <teacher id="D73"> 
     <name>Adam</name> 
     <department id="412"> 
     <name>English</name> 
     </department> 
    </teacher> 
    </teacherList> 
</result> 

而且我預計其結果將是:

t_id  teacher  d_id department 
D95   Mary  420   Math 
D95   Mary  421 statistics 
D73   Adam  412  English 

回答

2

可能不是最有效的方式,但有效。

require(XML) 
content_list <- XML::xmlToList(content) 
df<-as.data.frame (do.call(rbind, 
    lapply(content_list$teacherList, function(teacher) { 
     unname (do.call(cbind, list (teacher$.attrs, teacher$name, do.call(rbind, teacher[names(teacher) == "department"]))) ) 
    }) 
) 
) 
colnames(df)<-c("id","teacher","department","did") 


    id teacher department did 
1 D95 Mary  Math 420 
2 D95 Mary Statistics 421 
3 D73 Adam English 412