2017-09-06 16 views
0

我使用R來分析一個具有100,000行左右的CSV文件,看起來像這樣。我會很感激任何幫助 - 我真的很新鮮。將數據行添加到數據幀中,以確保每個唯一值有n次重複?

這是我的表:

Row1 -> Group, Position, Frequency 
Row2 -> 192, 1, 0.2 
Row3 -> 192, 2, 0.3 
Row4 -> 192, 3, 0.1 
Row5 -> 193, 4, 0.5 
Row6 -> 193, 5, 0.6 
Row7 -> 194, 6, 0.2 
Row8 -> 194, 7, 0.4 
Row9 -> 195, 8, 0.9 
Row10 -> 196, 9, 0.8 

我想在組列的每個值重複正好三次。 192次重複三次,但是194次重複兩次,而195和196只出現一次。我想添加行,如果在組列中沒有三次重複值,它將添加行以使總共三次重複,並將該行的其他列中的單元保留爲空(或NA)。所以,最終的結果是這樣的:

Row1 -> Group, Position, Frequency 
Row2 -> 192, 1, 0.2 
Row3 -> 192, 2, 0.3 
Row4 -> 192, 3, 0.1 
Row5 -> 193, 4, 0.5 
Row6 -> 193, 5, 0.6 
Row7 -> 193, NA, NA 
Row8 -> 194, 6, 0.2 
Row9 -> 194, 7, 0.4 
Row10-> 194, NA, NA 
Row11 -> 195, 8, 0.9 
Row12 -> 195, NA, NA 
Row13 -> 195, NA, NA 
Row14 -> 196, 9, 0.8 
Row15 -> 196, NA, NA 
Row16 -> 196, NA, NA 

回答

1
do.call(rbind, lapply(split(df, df$Group), function(a){ 
     data.frame(Group = rep(a$Group[1], 3), 
        Position= a$Position[1:3], 
        Frequency = a$Frequency[1:3]) 
})) 
#  Group Position Frequency 
#192.1 192  1  0.2 
#192.2 192  2  0.3 
#192.3 192  3  0.1 
#193.1 193  4  0.5 
#193.2 193  5  0.6 
#193.3 193  NA  NA 
#194.1 194  6  0.2 
#194.2 194  7  0.4 
#194.3 194  NA  NA 
#195.1 195  8  0.9 
#195.2 195  NA  NA 
#195.3 195  NA  NA 
#196.1 196  9  0.8 
#196.2 196  NA  NA 
#196.3 196  NA  NA 

DATA

df = structure(list(Group = c(192L, 192L, 192L, 193L, 193L, 194L, 
194L, 195L, 196L), Position = 1:9, Frequency = c(0.2, 0.3, 0.1, 
0.5, 0.6, 0.2, 0.4, 0.9, 0.8)), .Names = c("Group", "Position", 
"Frequency"), class = "data.frame", row.names = c(NA, -9L)) 
+0

謝謝你這麼多! –

2

隨着tidyverse,您可以使用tidyr::complete添加行的缺失組合:

library(tidyverse) 

df <- data_frame(Row = c("Row2", "Row3", "Row4", "Row5", "Row6", "Row7", "Row8", "Row9", "Row10"), 
       Group = c(192, 192, 192, 193, 193, 194, 194, 195, 196), 
       Position = 1:9, 
       Frequency = c(0.2, 0.3, 0.1, 0.5, 0.6, 0.2, 0.4, 0.9, 0.8)) 

df_filled <- df %>% 
    group_by(Group) %>% 
    mutate(i = row_number()) %>% 
    complete(i = 1:3) 

df_filled 
#> # A tibble: 15 x 5 
#> # Groups: Group [5] 
#> Group  i Row Position Frequency 
#> <dbl> <int> <chr> <int>  <dbl> 
#> 1 192  1 Row2  1  0.2 
#> 2 192  2 Row3  2  0.3 
#> 3 192  3 Row4  3  0.1 
#> 4 193  1 Row5  4  0.5 
#> 5 193  2 Row6  5  0.6 
#> 6 193  3 <NA>  NA  NA 
#> 7 194  1 Row7  6  0.2 
#> 8 194  2 Row8  7  0.4 
#> 9 194  3 <NA>  NA  NA 
#> 10 195  1 Row9  8  0.9 
#> 11 195  2 <NA>  NA  NA 
#> 12 195  3 <NA>  NA  NA 
#> 13 196  1 Row10  9  0.8 
#> 14 196  2 <NA>  NA  NA 
#> 15 196  3 <NA>  NA  NA 
+0

簡單而優雅 – PoGibas

+0

非常感謝!這非常有幫助:-) –

相關問題