我有成千上萬的XML文件,我試圖將其轉換爲R數據框。每個XML文件可能有不同的節點,每個節點的不同值,不同的結構等等,所以我試圖用一種不需要明確地輸出每個單獨文件的結構的方式來實現。但是,我無法將值分配給正確的標籤。將xml轉換爲數據框時分配給錯誤節點的值
假設我有文件中包含的以下XML命名爲「dat.xml」:
<?xml version="1.0" encoding="UTF-8"?>
<HH_V2 id="HH_V2">
<start>2017-01-30T11:31:56.811Z</start>
<end>2017-01-30T12:08:19.489Z</end>
<today>2017-01-30</today>
<deviceid>351569060022943</deviceid>
<time_st>2017-01-30</time_st>
<int_name>21</int_name>
<superv>4</superv>
<region>2</region>
<new_ea_flag>0</new_ea_flag>
<unique_id>c3d5c37d-b5c6-4b9d-a922-3b4f5be0e5ac</unique_id>
<village>Boana</village>
<hh_serial>71710003101</hh_serial>
<hh_serial2>71710003101</hh_serial2>
<id_consent>
<iconsent>
<iconsentlong />
</iconsent>
<consent>1</consent>
</id_consent>
<meta>
<instanceID>uuid:ff93ead6-77b3-4c14-be7c-cbeb520ce0d7</instanceID>
</meta>
</HH_V2>
使用上面的XML文件,下面的腳本,我的數據幀包含一個名爲「元」與值列UUID:ff93ead6-77b3-4c14-be7c-cbeb520ce0d7。然而,我期待/希望它包含一個名爲「instanceID」的列,該列具有相同的值,基於後一個標記立即圍繞該值的事實。這通常發生在其他嵌套節點上。有沒有人有什麼建議?
# Load packages
library(dplyr)
library(XML)
# Convert xml file to list of lists
temp_list <- "dat.xml" %>% XML::xmlParse() %>% XML::xmlToList()
# Unlist and store content as a single column with row
# names for each variable in that node and the value of
# the variable in a single column.
for (j in 1:length(temp_list)) {
temp_list[[j]] <- temp_list[[j]] %>% unlist(recursive = TRUE) %>%
as.data.frame(stringsAsFactors = FALSE)
}
# Each file is now a list of data frames comprised of
# 1 column of values and row names for each variable. So
# we bind these in order of their appearance in the list
# of data frames
temp_list <- do.call(rbind, temp_list)
# Since we want each row to be a column and each column
# to be a variable ('wide' format), we transpose the
# dataframe to produce a single row for each instance
# of the submitted form
t(temp_list) %>% as.data.frame(stringsAsFactors = FALSE)
'as.data.frame(as.list(unlist(temp_list,recursive = TRUE)),stringsAsFactors = FALSE )'(在你未修改的'temp_list'上)做到這一點?小小的不便:一切都轉換爲字符,並且名稱以父節點名稱爲前綴,例如'「meta.instanceID」'您想要的地方''instanceID「' –
適用於玩具數據集,稍後對完整數據進行測試。小心解釋爲什麼這與我的代碼相比起作用?不便之處很好(沒有petiods''),並且可以保留子字符串。 – user3614648