2017-07-31 30 views
0

此問題是我近期詢問(非常)here問題的變體。 (對不起,提出了兩個類似的問題,我問道我提出的問題並不完全正確,但我認爲我會留下原件以防未來對人有用,並將此問題分開。)在R中對子集進行排序,並且如果列的子集具有全零值,則過濾出行

我有一組看起來像這樣的數據,從前面的問題略作修改:

Category  Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
3  Veg Potatoes  0  1  0 
4  Veg Onions  0  2  8 
5  Veg Carrots  0  1  3 
6 Dairy Yoghurt  1  5  9 
7 Dairy  Milk  0  1  0 
8 Dairy Cheese  0  0  7 

我要篩選我的數據,這樣我只有由所有店出售類別 - 如果一個商店沒有任何整個類別的銷售,那麼我想過濾它。在此示例中,Veg類別將被過濾掉,因爲Shop1沒有Veg銷售。

爲了解決這個問題,我試着改變了我之前提出的使用FUN = anyFUN = all的建議,但是這並沒有奏效,每次都拋出錯誤,我不確定還有什麼可以嘗試的。

我很感謝您可以提供任何幫助。

+0

您可以嘗試獲取每個子集的總和,如果它等於0,則篩選。我可以看到它是通過'dplyr'包完成的。 – spicypumpkin

回答

4

這裏是一個想法與colSums

ind <- colSums(sapply(split(df[3:5], df$Category), function(i) colSums(i) == 0)) == 0 
df[df$Category %in% names(ind)[ind],] 

其中給出,

Category Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
6 Dairy Yoghurt  1  5  9 
7 Dairy Milk  0  1  0 
8 Dairy Cheese  0  0  7 
+1

或'rowSums':'df [rowSums(sapply(df [, - (1:2)],function(a)ave(a,df $ Category,FUN = sum)!= 0))== NCOL df [, - (1:2)]),]' –

+0

@Sotos感謝您的幫助,這很好! – Rose

1

這裏使用library(data.table)

dt <- data.table(category=c("Fruit","Fruit","Veg","Veg","Veg","Dairy","Dairy","Dairy"), 
          item=c("apples","oranges","potatoes","onions","carrots","yoghurt","milk","cheese"), 
          shop1=c(4,0,0,0,0,1,0,0), 
          shop2=c(6,2,1,2,1,5,1,0), 
          shop3=c(0,7,0,8,3,9,0,7)) 
dt_m <- melt(dt,id.vars = c("category","item")) 
dt_m[,counts:=sum(value),by=.(category,variable)] 
dt_m <- dt_m[counts>0] 
dt_m[,counts:=NULL] 
dt <- dcast.data.table(dt_m,category+item~variable,value.var = "value") 
dt <- na.omit(dt) 

或者用的解決方案

dt %>% melt(id.vars = c("category","item")) %>% group_by(category,variable) %>% 
    mutate(counts=sum(value)) %>% filter(counts>0) %>% mutate(counts=NULL) %>% 
    dcast(category+item~variable,value.var = "value") %>% na.omit() 
3

以下是使用dplyr的示例。您的第一個group_by Category變量,並且只保留具有超過0銷售額的記錄。

library(tidyverse) 
d <- data_frame(
    Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)), 
    Item = c("Apples", "Oranges", "Potatoes", "Onions", "Carrots", "Yoghurt", "Milk", "Cheese"), 
    Shop1 = c(4, rep(0, 4), 1, rep(0, 2)), 
    Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0), 
    Shop3 = c(0, 7, 0, 8, 3, 9, 0, 7) 
) 

d %>% 
    group_by(Category) %>% 
    filter(sum(Shop1) > 0 & sum(Shop2) > 0 & sum(Shop3) > 0) %>% 
    ungroup() 
0

另一個解決方案使用data.table使用兩個步驟。

# Data 
dt <- data.table(Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)), 
       Item  = c("Apples", "Oranges", "Potatoes", "Onions", 
           "Carrots", "Yoghurt", "Milk", "Cheese"), 
       Shop1 = c(4, rep(0, 4), 1, rep(0, 2)), 
       Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0), 
       Shop3 = c(0, 7, 0, 8, 3, 9, 0, 7)) 

filt <- dt[, any(sum(Shop1) == 0, sum(Shop2) == 0, sum(Shop3) == 0), 
      by = Category] 
filt 
     Category V1 
1: Fruit FALSE 
2:  Veg TRUE 
3: Dairy FALSE 

dt[Category %in% filt[V1 == FALSE, Category]] 

    Category Item Shop1 Shop2 Shop3 
1: Fruit Apples  4  6  0 
2: Fruit Oranges  0  2  7 
3: Dairy Yoghurt  1  5  9 
4: Dairy Milk  0  1  0 
5: Dairy Cheese  0  0  7 
相關問題