對於TL不同的列;博士我在底部一個簡單的問題:移值基於在第二列中,XML值data.frame
我試圖把XML文件到使用 - 在R.能表
<toes copyright='(C)version='1.1'> <generated date='2017-01-21
07:45:04'timestamp='1485006304'/>
<description> Active TOE vehicle levels and adjustments for the current
campaign up to the RDP cycle in progress. c0 = the cycle 0 capacity, adj
= comma-separated list of cycle:capacity adjustments, cur = current
capacity </description>
<defaults><def att='adj' value=''/></defaults>
<r toe="deairfor" veh="22" c0="30" cur="30"/>
<r toe="deairfor" veh="23" c0="40" cur="20" adj="1:35,2:20"/>
<r toe="deairfor" veh="26" c0="2" cur="2" adj="2:10,3:30"/>
</toes>
我預期的格式是這樣的:
"TOE" "Veh" "c0" "cur" "adj1" "adj2" "adj3"
"deairfor" 22 30 30 NA NA NA
"deairfor" 23 40 20 35 20 NA
"deairfor" 26 2 2 NA 10 30
我有導入XML文件,零經驗,但我認爲這個文件是不是格式正確,因爲我還沒有遇到任何帶有標籤內數據的XML示例,如< r趾「... data ...」/>。我已經能夠用下面來提取數據:
library(XML)
source <- "http://wiretap.wwiionline.com/xml/toes.sheet.xml"
xmlfile <- xmlTreeParse(source, useInternalNodes = TRUE)
nodes <- getNodeSet(xmlfile, "/toes//r")
Df1 <- NULL
for(i in 1:length(nodes)) {
Df1 <- t(xmlToList(nodes[[i]]))
Df2 <- smartbind(Df2,Df1[1,])
}
我只能在一個時間提取1行,所以我用了以後的代碼綁定在一起這些。我需要df1/2,否則它會在i = 1時出錯。用不同的方法可能會容易得多,但我無法使它工作。
這給我留下了一個數據幀DF2,所有的變量「因素」(爲什麼?)
"TOE" "Veh" "c0" "cur" "adj"
deairfor 22 30 30 NA
deairfor 23 40 20 35 1:35,2:20
deairfor 26 2 2 2 2:10,3:30
所以現在的困難就在於這個「ADJ」一欄。我可以將它與下列分開:
Df2 <- separate(data = Df2, col = adj, into = c("adj1", adj2","adj3"), sep = "\\,")
Df2 <- separate(data = Df2, col = adj1, into = c("adj1","adj1value"), sep = "\\:")
Df2 <- separate(data = Df2, col = adj2, into = c("adj2","adj2value"), sep = "\\:")
Df2 <- separate(data = Df2, col = adj3, into = c("adj3","adj3value"), sep = "\\:")
但是單元格不在右列。 DF2現在是如下:
"TOE" "Veh" "c0" "cur" "adj1" "adj1value" "adj2" "adj2value" "adj3" "adj3value"
deairfor 22 30 30 NA NA NA NA NA NA
deairfor 23 40 20 1 35 2 20 NA NA
deairfor 26 2 2 2 10 3 30 NA NA
雖然這最後一行必須是:(一旦adj1values是在適當的列我們也可以降ADJ1/ADJ2/ADJ3)
deairfor 26 2 2 NA NA 2 10 3 30
我已經試過無數方法將這些細胞移動到右側,但不斷出現錯誤,如:(的調整*列字符,因此分離的「1」之後)
Df2$adj3[Df2$adj1 == "1"] <- Df2$adj2
Df2$adj3value[Df2$adj1 == "1"] <- Df2$adj2value
"NAs are not allowed in subscripted assignments"
所以問題:我如何將這些VA適合專欄?
"TOE" "Veh" "c0" "cur" "adj"
deairfor 26 2 2 2:10,3:30
應該成爲
"TOE" "Veh" "c0" "cur" "adj1" "adj2" "adj3"
deairfor 26 2 2 NA 10 30
獎金的問題:我得到我需要使用許多行,因爲在開始XML導入並不是最佳選擇,反正做的更好給出的感覺我有目標?
嘗試一些什麼這個帖子用來從XML創建一個框架,看看它是否適合你。http://stackoverflow.com/questions/17198658/how- to-parse-xml-to-r-data-frame – sconfluentus
好奇的是,您發佈的xml與網址不匹配,因爲網頁沒有* adj * attribs。 – Parfait
是的,網頁隨着時間的推移而更新,Adj只會在兩週內再次出現,不幸的是 –