2014-01-13 85 views
0

我經常以嵌套列表的形式接收數據。我最終編寫了各種代碼,將這些代碼壓扁成data.frames。我想要一個更通用的解決方案,所以我不會爲每個單獨的列表寫一個代碼。所以這裏有一些示例數據來強調我的問題。將嵌套子列表展平爲data.frame

data_list <- list(structure(list(local_date_time = "2010-01-05T13:30:00", 
    value = -9999, data_quality = list(structure(list(qualifierid = 19, 
     qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T14:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T14:30:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T15:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T15:30:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T16:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T16:30:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T17:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T17:30:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality" 
)), structure(list(local_date_time = "2010-01-05T18:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
     valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"))) 

最簡單的方法當然是rbind的列表。 data.tablerbindlist是大名單快,像這樣:

library(data.table) 
rbindlist(data_list) 

但這返回:

 local_date_time value data_quality 
1: 2010-01-05T13:30:00 -9999  <list> 
2: 2010-01-05T14:00:00 -9999  <list> 
3: 2010-01-05T14:30:00 -9999  <list> 
4: 2010-01-05T15:00:00 -9999  <list> 
5: 2010-01-05T15:30:00 -9999  <list> 
6: 2010-01-05T16:00:00 -9999  <list> 
7: 2010-01-05T16:30:00 -9999  <list> 
8: 2010-01-05T17:00:00 -9999  <list> 
9: 2010-01-05T17:30:00 -9999  <list> 
10: 2010-01-05T18:00:00 -9999  <list> 

,因爲最後一列實際上是3項的嵌套列表這是不理想的。我可以這樣做plyr

library(plyr) 
result <- ldply(data_list, function(x) { 
    cbind(data.frame(t(unlist(x[1:2]))), data.frame(t(unlist(x[3])))) 
}) 

這工作正常。有沒有辦法將這種方法推廣到可能具有不同格式的嵌套列表的列表?如果列表是單一級別,則應該使用簡單的do.call(rbind, list_name)。在這種情況下,我知道第三個元素有一個子列表。但我經常不知道。爲每個寫一個自定義的包裝將會有點乏味。

回答

2

我偶然發現了一個名爲LinearizeNestedList的功能,其中Akhil S Bhel(有時候在SO上)。它「平整」嵌套列表。

在你的情況,你會想要「扁平」的子列表,而不是主列表本身。

也許它可以使用在您的情況如下:

library(devtools) 
source_gist("https://gist.github.com/mrdwab/4205477") 
# Sourcing https://gist.github.com/mrdwab/4205477/raw/1bd86c697b89de9941834882f1085c8312076e38/LinearizeNestedList.R 
# SHA-1 hash of file is dde479195258dbad9367274ceedbd5a68251478a 
x <- do.call(rbind.data.frame, lapply(data_list, LinearizeNestedList)) 
x 
#  local_date_time value data_quality.1.qualifierid 
# 2 2010-01-05T13:30:00 -9999       19 
# 21 2010-01-05T14:00:00 -9999       19 
# 3 2010-01-05T14:30:00 -9999       19 
# 4 2010-01-05T15:00:00 -9999       19 
# 5 2010-01-05T15:30:00 -9999       19 
# 6 2010-01-05T16:00:00 -9999       19 
# 7 2010-01-05T16:30:00 -9999       19 
# 8 2010-01-05T17:00:00 -9999       19 
# 9 2010-01-05T17:30:00 -9999       19 
# 10 2010-01-05T18:00:00 -9999       19 
#    data_quality.1.qualifier_description data_quality.1.valid 
# 2 Passed sanity check; see incident report IR_8    FALSE 
# 21 Passed sanity check; see incident report IR_8    FALSE 
# 3 Passed sanity check; see incident report IR_8    FALSE 
# 4 Passed sanity check; see incident report IR_8    FALSE 
# 5 Passed sanity check; see incident report IR_8    FALSE 
# 6 Passed sanity check; see incident report IR_8    FALSE 
# 7 Passed sanity check; see incident report IR_8    FALSE 
# 8 Passed sanity check; see incident report IR_8    FALSE 
# 9 Passed sanity check; see incident report IR_8    FALSE 
# 10 Passed sanity check; see incident report IR_8    FALSE 
+0

太棒了!謝謝!現在我們面臨的挑戰是要弄清楚將這個包含在我的包中的許可證,特別是因爲上面的代碼沒有且不在CRAN上。 – Maiasaura

+0

@Maiasaura,正如我所提到的,我已經看到了堆棧溢出的Akhil,所以也許你可以在他最近的問題或答案中找到他。 – A5C1D2H2I1M1N2O1R2T1

0

簡單lapplyas.data.frame也會做,只要你有嵌套的只是一個水平至少:

> res <- do.call(rbind, lapply(data_list, as.data.frame)) 
> str(res) 
'data.frame': 10 obs. of 5 variables: 
$ local_date_time     : Factor w/ 10 levels "2010-01-05T13:30:00",..: 1 2 3 4 5 6 7 8 9 10 
$ value        : num -9999 -9999 -9999 -9999 -9999 ... 
$ data_quality.qualifierid   : num 19 19 19 19 19 19 19 19 19 19 
$ data_quality.qualifier_description: Factor w/ 1 level "Passed sanity check; see incident report IR_8": 1 1 1 1 1 1 1 1 1 1 
$ data_quality.valid    : logi FALSE FALSE FALSE FALSE FALSE FALSE ...