基於列中的字符和數據框中出現的順序的每個組的子集行

我有一個類似於此的數據。基於列中的字符和數據框中出現的順序的每個組的子集行

B <- data.frame(State = c(rep("Arizona", 8), rep("California", 8), rep("Texas", 8)), 
    Account = rep(c("Balance", "Balance", "In the Bimester", "In the Bimester", "Expenses", 
    "Expenses", "In the Bimester", "In the Bimester"), 3), Value = runif(24))

可以看到，Account有4次出現的元件"In the Bimester"的，兩個「塊」的兩個元件對於每個狀態，"Expenses"在它們之間的。

這裏的順序很重要，因爲第一個塊與第二個塊沒有引用相同的東西。

我的數據實際上比較複雜，它有第四個變量，表示Account的每一行的含義。每個Account元素的元素數量（因子本身）可以改變。例如，在某些狀態下，"In the Bimester"的第一個「塊」可以有6行，第二個，7;但是，我無法用這第四個變量來區分。

期望：我想我的子集數據，按每個狀態，子集只有第一個「豆腐塊」，由每個州或第二「塊」的行劈裂這兩個"In the Bimester"。

我有一個解決方案，使用data.table包，但我發現它有點差。有什麼想法嗎？

library(data.table) 
B <- as.data.table(B) 
B <- B[, .(Account, Value, index = 1:.N), by = .(State)] 
x <- B[Account == "Expenses", .(min_ind = min(index)), by = .(State)] 
B <- merge(B, x, by = "State") 
B <- B[index < min_ind & Account == "In the Bimester", .(Value), by = .(State)]

來源

2017-10-04 falecomdino

您可以使用dplyr包：

library(dplyr) 
B %>% mutate(helper = data.table::rleid(Account)) %>% 
     filter(Account == "In the Bimester") %>% 
     group_by(State) %>% filter(helper == min(helper)) %>% select(-helper) 

# # A tibble: 6 x 3 
# # Groups: State [3] 
#  State   Account  Value 
#  <fctr>   <fctr>  <dbl> 
# 1 Arizona In the Bimester 0.17730148 
# 2 Arizona In the Bimester 0.05695585 
# 3 California In the Bimester 0.29089678 
# 4 California In the Bimester 0.86952723 
# 5  Texas In the Bimester 0.54076144 
# 6  Texas In the Bimester 0.59168138

如果不是min您使用max你會得到"In the Bimester"最後出現的每個State。您也可以通過將最後一個管道更改爲select(-helper,-Account)來排除Account列。

p.s.如果您不想使用data.table中的rleid，只需使用dplyr函數，請查看此thread。

來源

2017-10-04 21:19:21 Masoud

基於列中的字符和數據框中出現的順序的每個組的子集行

回答

相關問題