strpslit一個字符數組並同時轉換爲數據幀

我有什麼感覺像一個困難的數據處理問題，並希望得到一些指導。以下是對我目前的陣列的樣子，還有我希望什麼數據框獲得測試版本：strpslit一個字符數組並同時轉換爲數據幀

dput(test) 
c("<play quarter=\"1\" oncourt-id=\"\" time-minutes=\"12\" time-seconds=\"0\" id=\"1\"/>", "<play quarter=\"2\" oncourt-id=\"\" time-minutes=\"10\" id=\"1\"/>") 

test 
[1] "<play quarter=\"1\" oncourt-id=\"\" time-minutes=\"12\" time-seconds=\"0\" id=\"1\"/>" 
[2] "<play quarter=\"2\" oncourt-id=\"\" time-minutes=\"10\" id=\"1\"/>" 

desired_df 
    quarter oncourt-id time-minutes time-seconds id 
1  1   NA    12    0  1 
2  3   NA    10    NA  1

有我處理的幾個問題：

字符數組「測試「反斜線應該沒有什麼，但我有困難使用這種格式gsub gsub（」\「，」「，測試）。
並非測試中的每個元素都具有相同數量的條目，請注意，在第2個元素中沒有time-seconds，因此對於數據框，我更喜歡它返回NA。

我已經嘗試使用strsplit（測試，「」）首先拆分空間，它只存在於不同的列之間，但隨後我返回列表，這是很難處理的列表。

來源

2017-02-25 Canovice

這看起來像'XML'？爲什麼不考慮用'XML'庫解析它？ – salient

你有xml那裏。你可以解析它，然後對結果運行rbindlist。這可能比嘗試將名稱 - 值對分割爲字符串困難得多。

dflist <- lapply(test, function(x) { 
    df <- as.data.frame.list(XML::xmlToList(x)) 
    is.na(df) <- df == "" 
    df 
}) 

data.table::rbindlist(dflist, fill = TRUE) 
# quarter oncourt.id time.minutes time.seconds id 
# 1:  1   NA   12   0 1 
# 2:  2   NA   10   NA 1

注意：您將需要XML和data.table包這一解決方案。

來源

2017-02-25 01:11:39

非常感謝豐富的解決方案，非常感謝。 – Canovice

strpslit一個字符數組並同時轉換爲數據幀

回答

相關問題