2017-08-01 44 views
1

每個籃子可容納的水果總數爲10.對於每個籃子,如果數量爲10且缺少一個水果,我想爲該籃子添加一行說那個水果的數量是0。 以下是生成數據幀的代碼。爲每個缺失值的組添加一行

Basket <- c("A","A","B","B","C","C","C") 
Fruit <- c("Apple","Orange","Apple","Orange","Orange","Apple", "Guava") 
count <- c("5","5","7","3","2","6","4") 
data <- data.frame(Basket,Fruit,count) 

    Basket Fruit count 
1  A Apple  5 
2  A Orange  5 
3  B Apple  7 
4  B Guava  3 
5  C Orange  2 
6  C Apple  6 
7  C Guava  4 

我想基本上像它看起來是這樣的:

Basket Fruit count 
1  A Apple  5 
2  A Orange  5 
4  A Guava  0 
5  B Apple  7 
6  B Orange  0 
7  B Guava  3 
8  C Orange  2 
9  C Apple  6 
10  C Guava  4 

不完全相信,如果循環將是一種有效的方法,但開放的建議。目標是爲每個水果在羣體中獲得準確的平均值。

回答

1

將您data.frame寬格式,以0而不是來港填充它,然後將其轉換回高大格式:

count <- c(5,5,7,3,2,6,4)  # should be integers, not strings 
data <- data.frame(Basket,Fruit,count) 

d1 <- tidyr::spread(data, Fruit, count, fill = 0) 
d2 <- tidyr::gather(d1, Fruit, count, -Basket) 
+0

Nvm。這工作! – rockboy23

+0

關於不匹配屬性的抱怨可能是由於字符變量被視爲「因素」。嘗試在'data.frame'結構中添加'stringsAsFactors = FALSE'。或者更好的是,使用'tibble :: data_frame'來代替。 –

0
data <- data.frame(Basket,Fruit,count,stringsAsFactors = F) 
full = merge(
    data, 
    expand.grid(
     Basket=data$Basket, 
     Fruit=data$Fruit 
    ), 
    all.y=T 
) 
full$count = ifelse(is.na(full$count), 0, full$count) 
+0

Quinn,謝謝你的回答。它確實有效,但我意識到我的數據集Basket是一個不可能列出的GUID。 – rockboy23

+0

我剛剛做了什麼改變? –

+0

我的數據框太大,無法使用。我收到一個錯誤,說 錯誤:無法分配大小爲405.1 GB的矢量 – rockboy23

1

我知道有擴散和聚集funcion,從tydir包

library(tidyr) 
data <- data %>% 
    spread(Fruit, count, fill = 0) %>% 
    gather(Fruit, count, -Basket) 

把0,計數值必須是整數,而不是因素。 爲此,您可以用

data$count <- as.integer(data$count)) 
0

次數僅工作,如果只有1個缺失水平的水果,否則應該按預期工作

Basket <- c("A","A","B","B","C","C","C") 
Fruit <- c("Apple","Orange","Apple","Orange","Orange","Apple", "Guava") 
count <- c(5,5,7,3,2,6,4) 
data <- data.frame(Basket,Fruit,count, stringsAsFactors = FALSE) 

fruit_levels <- levels(as.factor(data$Fruit)) 
append_df <- data.frame(Basket = NA, Fruit = NA, count = NA) 

for(i in levels(as.factor(data$Basket))){ 
    temp_df <- filter(data, Basket == i) 
    temp_count <- 10 - sum(temp_df$count) 
    if(length(levels(as.factor(temp_df$Fruit))) != length(fruit_levels)){ 

     temp_fruit <- cbind.data.frame(fruit = fruit_levels, count = ifelse(fruit_levels %in% temp_df$Fruit, 0, 1)) 
     temp_fruit2 <- filter(temp_fruit, count == 1) %>% select(fruit) 
     temp_fruit3 <- temp_fruit2[,1] %>% as.character() 

     temp_df_to_append <- data.frame(Basket = i, Fruit = temp_fruit3, count = temp_count) 
     append_df <- rbind.data.frame(append_df, temp_df_to_append) 
    } 
} 

data <- rbind.data.frame(data, append_df[-1,]) %>% arrange(Basket)