2014-02-25 59 views
1

我有一系列供應商和計費金額,並將計費金額分組爲桶。關於多個標準的grep記錄

我想將數據集的子集分配給'< 100'桶以及'500-1000'桶或'> 1000'桶中只有數量的提供者。樣本數據:

df <- structure(list(GrossAmt = c(74.37, 69.69, 705.76, 694.12, 5243, 
2680.95, 23270, 64.31, 64.31, 64.31, 1863.6, 4030.38, 43.86, 
36.57, 37.29, 31.02, 59.43, 27.65), VenName = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 1L, 1L, 
1L), .Label = c("Labcorp", "Quest Diagnostics Incorporated", 
"THOMAS JEFFERSON UNIV HOSPITAL", "WASHINGTON HOSPITAL CENTER" 
), class = "factor"), AmtGrp = structure(c(1L, 1L, 3L, 3L, 2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("<= 100", 
"> 1000", "500 - 1000"), class = "factor")), .Names = c("GrossAmt", 
"VenName", "AmtGrp"), class = "data.frame", row.names = c(NA, 
-18L)) 

在我的例子中,所產生的數據集將有來自TJU醫院&華盛頓醫院中心的所有記錄,因爲他們有賬單都< $ 100 &在較高的容器中的一個。其他供應商將被過濾掉它們沒有賬單> 500美元。

我會提供迄今爲止所做的工作,但老實說不知道從哪裏開始這個原因,所以原諒我。我的第一個直覺是,我需要根據分組標準爲記錄設置grep命令,但我不知道如何根據供應商的名稱進行匹配。

編輯 - 擴大的問題:不管具體數額組是什麼

我怎麼能以任何供應商篩選,他們陷入一個以上的AMT組,?

回答

2
library(dplyr) 

chain(df, group_by(VenName), 
      filter(any(AmtGrp == '<= 100'), 
       !all(AmtGrp == '<= 100'))) 

編輯:第二個問題

chain(df, group_by(VenName), 
      filter(length(unique(AmtGrp)) > 1)) 
2

下面是與基座功能avesubset的溶液:

subset(df, as.logical(ave(as.character(AmtGrp), VenName, FUN = function(x) 
    any(x == "<= 100") & any(x %in% c("500 - 1000", "> 1000"))))) 

    GrossAmt      VenName  AmtGrp 
1  74.37 THOMAS JEFFERSON UNIV HOSPITAL  <= 100 
2  69.69 THOMAS JEFFERSON UNIV HOSPITAL  <= 100 
3 705.76 THOMAS JEFFERSON UNIV HOSPITAL 500 - 1000 
4 694.12 THOMAS JEFFERSON UNIV HOSPITAL 500 - 1000 
5 5243.00 THOMAS JEFFERSON UNIV HOSPITAL  > 1000 
6 2680.95 THOMAS JEFFERSON UNIV HOSPITAL  > 1000 
7 23270.00 THOMAS JEFFERSON UNIV HOSPITAL  > 1000 
8  64.31  WASHINGTON HOSPITAL CENTER  <= 100 
9  64.31  WASHINGTON HOSPITAL CENTER  <= 100 
10 64.31  WASHINGTON HOSPITAL CENTER  <= 100 
11 1863.60  WASHINGTON HOSPITAL CENTER  > 1000 
12 4030.38  WASHINGTON HOSPITAL CENTER  > 1000