在R中對子集進行排序，並且如果列的子集具有全零值，則過濾出行

此問題是我近期詢問（非常）here問題的變體。（對不起，提出了兩個類似的問題，我問道我提出的問題並不完全正確，但我認爲我會留下原件以防未來對人有用，並將此問題分開。）在R中對子集進行排序，並且如果列的子集具有全零值，則過濾出行

我有一組看起來像這樣的數據，從前面的問題略作修改：

Category  Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
3  Veg Potatoes  0  1  0 
4  Veg Onions  0  2  8 
5  Veg Carrots  0  1  3 
6 Dairy Yoghurt  1  5  9 
7 Dairy  Milk  0  1  0 
8 Dairy Cheese  0  0  7

我要篩選我的數據，這樣我只有由所有店出售類別 - 如果一個商店沒有任何整個類別的銷售，那麼我想過濾它。在此示例中，Veg類別將被過濾掉，因爲Shop1沒有Veg銷售。

爲了解決這個問題，我試着改變了我之前提出的使用FUN = any到FUN = all的建議，但是這並沒有奏效，每次都拋出錯誤，我不確定還有什麼可以嘗試的。

我很感謝您可以提供任何幫助。

來源

2017-07-31 Rose

您可以嘗試獲取每個子集的總和，如果它等於0，則篩選。我可以看到它是通過'dplyr'包完成的。 – spicypumpkin

這裏是一個想法與colSums，

ind <- colSums(sapply(split(df[3:5], df$Category), function(i) colSums(i) == 0)) == 0 
df[df$Category %in% names(ind)[ind],]

其中給出，

Category Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
6 Dairy Yoghurt  1  5  9 
7 Dairy Milk  0  1  0 
8 Dairy Cheese  0  0  7

來源

2017-07-31 14:57:41 Sotos

或'rowSums'：'df [rowSums（sapply（df [， - （1：2）]，function（a）ave（a，df $ Category，FUN = sum）！= 0））== NCOL df [， - （1：2）]），]' –

@Sotos感謝您的幫助，這很好！ – Rose

這裏使用library(data.table)

dt <- data.table(category=c("Fruit","Fruit","Veg","Veg","Veg","Dairy","Dairy","Dairy"), 
          item=c("apples","oranges","potatoes","onions","carrots","yoghurt","milk","cheese"), 
          shop1=c(4,0,0,0,0,1,0,0), 
          shop2=c(6,2,1,2,1,5,1,0), 
          shop3=c(0,7,0,8,3,9,0,7)) 
dt_m <- melt(dt,id.vars = c("category","item")) 
dt_m[,counts:=sum(value),by=.(category,variable)] 
dt_m <- dt_m[counts>0] 
dt_m[,counts:=NULL] 
dt <- dcast.data.table(dt_m,category+item~variable,value.var = "value") 
dt <- na.omit(dt)

或者用的解決方案

dt %>% melt(id.vars = c("category","item")) %>% group_by(category,variable) %>% 
    mutate(counts=sum(value)) %>% filter(counts>0) %>% mutate(counts=NULL) %>% 
    dcast(category+item~variable,value.var = "value") %>% na.omit()

來源

2017-07-31 15:02:23 quant

以下是使用dplyr的示例。您的第一個group_by Category變量，並且只保留具有超過0銷售額的記錄。

library(tidyverse) 
d <- data_frame(
    Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)), 
    Item = c("Apples", "Oranges", "Potatoes", "Onions", "Carrots", "Yoghurt", "Milk", "Cheese"), 
    Shop1 = c(4, rep(0, 4), 1, rep(0, 2)), 
    Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0), 
    Shop3 = c(0, 7, 0, 8, 3, 9, 0, 7) 
) 

d %>% 
    group_by(Category) %>% 
    filter(sum(Shop1) > 0 & sum(Shop2) > 0 & sum(Shop3) > 0) %>% 
    ungroup()

來源

2017-07-31 15:04:53 sinQueso

另一個解決方案使用data.table使用兩個步驟。

# Data 
dt <- data.table(Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)), 
       Item  = c("Apples", "Oranges", "Potatoes", "Onions", 
           "Carrots", "Yoghurt", "Milk", "Cheese"), 
       Shop1 = c(4, rep(0, 4), 1, rep(0, 2)), 
       Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0), 
       Shop3 = c(0, 7, 0, 8, 3, 9, 0, 7)) 

filt <- dt[, any(sum(Shop1) == 0, sum(Shop2) == 0, sum(Shop3) == 0), 
      by = Category] 
filt 
     Category V1 
1: Fruit FALSE 
2:  Veg TRUE 
3: Dairy FALSE 

dt[Category %in% filt[V1 == FALSE, Category]] 

    Category Item Shop1 Shop2 Shop3 
1: Fruit Apples  4  6  0 
2: Fruit Oranges  0  2  7 
3: Dairy Yoghurt  1  5  9 
4: Dairy Milk  0  1  0 
5: Dairy Cheese  0  0  7

來源

2017-07-31 15:58:49 snoram

在R中對子集進行排序，並且如果列的子集具有全零值，則過濾出行

回答

相關問題