根據多行中的值過濾R中的行

我試圖過濾掉R中不需要的多行數據，但我不知道如何去做。根據多行中的值過濾R中的行

我使用的數據看起來有點像這樣：

Category  Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
3  Veg Potatoes  0  0  0 
4  Veg Onions  0  0  0 
5  Veg Carrots  0  0  0 
6 Dairy Yoghurt  0  0  0 
7 Dairy  Milk  0  1  0 
8 Dairy Cheese  0  0  0

我只是想保持大類，其中至少一個項目擁有的商店至少一個正值。

在這種情況下，我想擺脫所有Veg行，因爲沒有任何商店銷售任何蔬菜。我希望將所有的Fruit行，我想保持所有的Dairy行，即使是那些在所有店鋪零值，因爲Dairy行之一確實有大於0

我的值試圖在使用group_by(Category)之後試圖使用colSums，希望它每次都能將類別的內容相加，但它不起作用。我也嘗試在rowSums的最後添加一列，並根據頻率進行過濾，但我只能以這種方式過濾單個行，而不是基於整個類別的行。

雖然我可以過濾出零值的單行（例如第3行），但我的難處在於像第6行和第8行那樣行，其中每個商店的所有值都爲零，但我想保留這些行因爲其他Dairy行的值大於零。

來源

2017-07-31 Rose

1）子集和/ AVErowSums(...) > 0具有用於每行一個元素。如果該行中存在非零，則該元素爲TRUE。它假定負值是不可能的。（如果可能爲負值，則改爲使用rowSums(DF[-1:-2]^2) > 0）。它還假定商店是前兩列中的那些列。特別是，它可以用於任何數量的商店。然後ave爲那些值爲「真」的any組和subset僅保留這些值的組生成TRUE。沒有包被使用。

subset(DF, ave(rowSums(DF[-1:-2]) > 0, Category, FUN = any))

，並提供：

Category Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
6 Dairy Yoghurt  0  0  0 
7 Dairy Milk  0  1  0 
8 Dairy Cheese  0  0  0

1A）這方面的一個變化將是以下，如果你不介意硬編碼的商店：

subset(DF, ave(Shop1 + Shop2 + Shop3 > 0, Category, FUN = any))

2）dplyr

library(dplyr) 
DF %>% group_by(Category) %>% filter(any(Shop1, Shop2, Shop3)) %>% ungroup

給予：

# A tibble: 5 x 5 
# Groups: Category [2] 
    Category Item Shop1 Shop2 Shop3 
    <fctr> <fctr> <int> <int> <int> 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
3 Dairy Yoghurt  0  0  0 
4 Dairy Milk  0  1  0 
5 Dairy Cheese  0  0  0

3）過濾器/分割另一個鹼溶液是：

do.call("rbind", Filter(function(x) any(x[-1:-2]), split(DF, DF$Category)))

，並提供：

 Category Item Shop1 Shop2 Shop3 
Dairy.6 Dairy Yoghurt  0  0  0 
Dairy.7 Dairy Milk  0  1  0 
Dairy.8 Dairy Cheese  0  0  0 
Fruit.1 Fruit Apples  4  6  0 
Fruit.2 Fruit Oranges  0  2  7

4）dplyr/tidyr使用gather到將數據轉換爲那裏的長格式是每個值的一行，然後使用any過濾組。最後轉換回廣泛的形式。

library(dplyr) 
library(tidyr) 
DF %>% 
    gather(shop, value, -(Category:Item)) %>% 
    group_by(Category) %>% 
    filter(any(value)) %>% 
    ungroup %>% 
    spread(shop, value)

，並提供：

# A tibble: 5 x 5 
    Category Item Shop1 Shop2 Shop3 
* <fctr> <fctr> <int> <int> <int> 
1 Dairy Cheese  0  0  0 
2 Dairy Milk  0  1  0 
3 Dairy Yoghurt  0  0  0 
4 Fruit Apples  4  6  0 
5 Fruit Oranges  0  2  7

注：在重現的形式輸入：

Lines <- " Category  Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
3  Veg Potatoes  0  0  0 
4  Veg Onions  0  0  0 
5  Veg Carrots  0  0  0 
6 Dairy Yoghurt  0  0  0 
7 Dairy  Milk  0  1  0 
8 Dairy Cheese  0  0  0" 

DF <- read.table(text = Lines)

來源

2017-07-31 12:34:44

這很棒：feed'ave '作爲第一個參數的邏輯向量，那麼最終的輸出可以直接用於子集化。 – lmo

哇，謝謝你的多種解決方案和清晰的解釋！ – Rose

以下是基於R的方法，其中rowSums,ave和[。

dat[ave(rowSums(dat[grep("Shop", names(dat))]), dat$Category, FUN=max) > 0,]

rowSums計算銷售在商店的變量每行（使用grep到子集）。產生的載體被送至ave，其由dat$Category組成，並返回每個的最大銷售量。最後，原始數據框架是基於銷售是否積極的子集。

這返回

Category Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
6 Dairy Yoghurt  0  0  0 
7 Dairy Milk  0  1  0 
8 Dairy Cheese  0  0  0

數據

dat <- 
structure(list(Category = structure(c(2L, 2L, 3L, 3L, 3L, 1L, 
1L, 1L), .Label = c("Dairy", "Fruit", "Veg"), class = "factor"), 
    Item = structure(c(1L, 6L, 7L, 5L, 2L, 8L, 4L, 3L), .Label = c("Apples", 
    "Carrots", "Cheese", "Milk", "Onions", "Oranges", "Potatoes", 
    "Yoghurt"), class = "factor"), Shop1 = c(4L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L), Shop2 = c(6L, 2L, 0L, 0L, 0L, 0L, 1L, 0L 
    ), Shop3 = c(0L, 7L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("Category", 
"Item", "Shop1", "Shop2", "Shop3"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

來源

2017-07-31 12:33:14 lmo

尼斯。我準備發佈df [!! ave（rowSums（df [3：5]），df $ Category，FUN = function（i）sum（i）> 0），]' – Sotos

根據多行中的值過濾R中的行

回答

相關問題