2017-09-05 26 views
0

我需要將新列添加到數據框中,如果多列變量列表增加(如果列有列表需要添加新列否則保持該列作爲相同)R-如何基於多個列表(列)將可變增量列動態添加到數據框中

我的數據幀,

U_ID Value         AD CT value1    Citycode 
    1 list(`Cno`="50",'cna'="\n\rjhon\n") ia BG list(`Cno`="50")  TY 
    1 list(`Cno`="20",`cna`="guna")   AS DB list(`Cno`="\n\r20") UI 
    2 list(`Cno`="30",`cna`="rt",`cf`="ty") BN FV list(`Cno`="30")  GH 
    2 NULL         VF TY NULL     TY 
    3 list(`Cno`="\n\r30")     RR TT list(`Cno`="30")  ST 

我的願望輸出將是,

U_ID Value         Cno cna cf  AD CT value1    Cno1   Citycode 
1  list(`Cno`="50",`cna'="\n\rjhon\n") 50 jhon NULL ia BG list(`Cno1`="50")  50    TY 
1  list(`Cno`="20",`cna'="guna")   20 guna NULL  AS DB list(`Cno1`="\n\r20") 20    UI 
2  list(`Cno`="30",`cna'="rt",`cf'="ty") 30 rt ty  BN FV list(`Cno1`="30")  30    GH 
2  NULL         NULL NULL NULL VF TY NULL     NULL   TY 
3  list(`Cno`="\n\r30")     30 NULL NULL  RR TT list(`Cno1`="30")  30    ST 

數據,

structure(list(U_ID = c(1, 1, 2, 2, 3), Value = list(structure(list(
    `Cno#` = "50", cna = "\n\rjhon\n"), .Names = c("Cno#", "cna" 
)), structure(list(`Cno#` = "50", cna = "guna"), .Names = c("Cno#", 
"cna")), structure(list(`Cno#` = "30", cna = "rt", cf = "ty"), .Names = c("Cno#", 
"cna", "cf")), "NULL", structure(list(`Cno#` = "\n\r30"), .Names = "Cno#")), 
    AD = c("ia", "AS", "BN", "VF", "RR"), CT = c("BG", "DB", 
    "FV", "TY", "TT"), Value1 = list(structure(list(`Cno#` = "50"), .Names = "Cno#"), 
     structure(list(`Cno#` = "\n\r20"), .Names = "Cno#"), 
     structure(list(`Cno#` = "30"), .Names = "Cno#"), "NULL", 
     structure(list(`Cno#` = "30"), .Names = "Cno#")), Citycode = c("TY", 
    "UI", "GH", "RY", "ST")), .Names = c("U_ID", "Value", "AD", 
"CT", "Value1", "Citycode"), row.names = c(NA, -5L), class = "data.frame") 
+4

你有沒有嘗試什麼嗎? – Sotos

+0

您的帖子末尾的數據與開始時的數據不一樣... –

+0

是的,我錯過了value1列表變量,col1是正確的。 –

回答

1

這是dplyr的解決方案。

library(dplyr) 

dat %>% 
    mutate(idx = as.character(`is.na<-`(cumsum(Value != "NULL"), 
             Value == "NULL"))) %>% 
    left_join(filter(., Value != "NULL") %>% 
       pull(Value) %>% 
       bind_rows(.id = "idx"), 
      by = "idx") %>% 
    mutate(idx2 = as.character(`is.na<-`(cumsum(Value1 != "NULL"), 
             Value1 == "NULL"))) %>% 
    left_join(filter(., Value1 != "NULL") %>% 
       pull(Value1) %>% 
       bind_rows(.id = "idx2"), 
      by = "idx2") %>% 
    select(-idx, -idx2) 

這裏,dat是你的數據幀的名稱。

結果:

U_ID   Value AD CT Value1 Citycode Cno#.x  cna cf Cno#.y 
1 1 50, \n\rjhon\n ia BG  50  TY  50 \n\rjhon\n <NA>  50 
2 1  50, guna AS DB \n\r20  UI  50  guna <NA> \n\r20 
3 2  30, rt, ty BN FV  30  GH  30   rt ty  30 
4 2   NULL VF TY NULL  RY <NA>  <NA> <NA> <NA> 
5 3   \n\r30 RR TT  30  ST \n\r30  <NA> <NA>  30 
+0

感謝您的即時響應,但我的疑問是如何動態地檢查和添加新列:如果我的數據框有超過50列的列表(值1,值2等值50)。我需要檢查是否列是列表我需要添加新列而不是手動添加(value1,value2)。 –

0

編輯
取代我有一個佔了多個這樣的列表列的答案。


這裏是一個可能的基礎R方法:

na_if_null <- function(x) if (is.null(x)) NA else x 

new_cols <- lapply(
    Filter(is.list, df), 
    function(list_col) { 
    names_ <- setNames(nm = unique(do.call(c, lapply(list_col, names)))) 
    lapply(names_, function(name) sapply(list_col, function(x) 
     trimws(na_if_null(as.list(x)[[name]])))) 
    } 
) 

res <- do.call(
    data.frame, 
    c(
    list(df, check.names = FALSE, stringsAsFactors = FALSE), 
    do.call(c, new_cols) 
) 
) 

# U_ID   Value AD CT Value1 Citycode Value.Cno# Value.cna Value.cf Value1.Cno# 
# 1 1 50, \n\rjhon\n ia BG  50  TY   50  jhon  <NA>   50 
# 2 1  50, guna AS DB \n\r20  UI   50  guna  <NA>   20 
# 3 2  30, rt, ty BN FV  30  GH   30  rt  ty   30 
# 4 2   NULL VF TY NULL  RY  <NA>  <NA>  <NA>  <NA> 
# 5 3   \n\r30 RR TT  30  ST   30  <NA>  <NA>   30 
+0

,感謝您的回覆,截至上述代碼的工作正常。但我有兩個疑問。 –

+0

,感謝您的回覆,截至上述數據其工作正常。但我有兩個疑問。 1.slowness,我只有3000行和59列(4列的列表),它將需要將近2分鐘執行,有沒有什麼辦法來減少執行時間.2。上面的代碼正在爲列表工作,請你幫忙我,如何做相同的嵌套列表列和數組列列表。 –

0

我相信這恰好給您期望的輸出:

library(dplyr) 
df1 %>% 
    left_join(df1 %>% 
       filter(Value != "NULL") %>% 
       mutate(Value_ = map(Value,unlist), vnames = map(Value_,names)) %>% 
       unnest(Value_,vnames) %>% 
       spread(vnames,Value_) %>% 
       rename(Cno = `Cno#`)) %>% 
    left_join(df1 %>% 
       filter(Value1 != "NULL") %>% 
       mutate(Cno1 = map(Value1,~as.numeric(unlist(.x)))) %>% 
       select(-Value,-Value1)) %>% 
    select(U_ID,Value,Cno,cna,cf,AD,CT,Value1,Cno1,Citycode) 

# U_ID   Value Cno  cna cf AD CT Value1 Cno1 Citycode 
# 1 1 50, \n\rjhon\n  50 \n\rjhon\n <NA> ia BG  50 50  TY 
# 2 1  50, guna  50  guna <NA> AS DB \n\r20 20  UI 
# 3 2  30, rt, ty  30   rt ty BN FV  30 30  GH 
# 4 2   NULL <NA>  <NA> <NA> VF TY NULL NULL  RY 
# 5 3   \n\r30 \n\r30  <NA> <NA> RR TT  30 30  ST 
相關問題