R中的兩個組的唯一實例

我試圖確定每個商店每週唯一客戶的數量。R中的兩個組的唯一實例

我有一段代碼可以完成這項任務，但列表並不是我正在尋找的東西。

我有如下表：

store week customer_ID 
1   1 1 
1   1 1 
1   1 2 
1   2 1 
1   2 2 
1   2 3 
2   1 1 
2   1 1 
2   1 2 
2   2 2 
2   2 3 
2   2 3

所以我每週都需要計算有多少獨特的客戶有。

舉例說，如果客戶1在第1周訪問過，然後在第2周重新訪問，那麼這不會算作唯一訪問。

如果同一個客戶在第1周或任何其他周訪問了商店2。那麼這將被視爲第二家店的獨特訪問。

結果將如下所示：

store week unique Customers 
1   1 2 
1   2 1 
2   1 2 
2   2 1

我用下面的，但它不是正確

agg <- aggregate(data=df, customer_ID~ week+store, function(x) length(unique(x))) 

structure(list(store = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L), week = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L), customer_ID = c(1L, 1L, 2L, 1L, 2L, 3L, 1L, 1L, 2L, 
2L, 3L, 3L)), .Names = c("store", "week", "customer_ID"), class = "data.frame", row.names = c(NA, 
-12L))

來源

2016-12-15 daveDo

這是一個基礎R方法。這個想法是將數據分成一個data.frames列表，每個商店一個。假設觀測值按周排序，則刪除重複的客戶ID觀測值。子集data.frame使用您的函數進行聚合。然後do.call和rbind把結果放到一個data.frame：

do.call(rbind, lapply(split(df, df$store), 
         function(i) aggregate(data=i[!duplicated(i$customer_ID),], 
              customer_ID ~ week+store, length))) 
    week store customer_ID 
1.1 1  1   2 
1.2 2  1   1 
2.1 1  2   2 
2.2 2  2   1

，以確保您的data.frame是有序的正常之前嘗試此，你可以使用order：

df <- df[order(df$store, df$week), ]

如果有興趣的話，我也會提供一個data.table解決方案。

庫（data.table） setDT（DF）

df[df[, !duplicated(customer_ID), by=store]$V1, 
    .(newCust=length(customer_ID)), by=.(store, week)] 
    store week newCust 
1:  1 1  2 
2:  1 2  1 
3:  2 1  2 
4:  2 2  1

此方法使用的邏輯向量df[, !duplicated(customer_ID), by=store]$V1通過存儲於子集的數據的唯一ID，然後由存儲計算新的客戶的獨特數-周。

來源

2016-12-15 19:09:24 lmo

所以在我運行那段代碼之前，我應該按周安排嗎？或按星期和商店？ – daveDo

@Imo爲了清晰起見，您能否包含排列/排序功能？ – daveDo

我想知道是否可以有一個data.table/dplyr解決方案呢？ –

R中的兩個組的唯一實例

回答

相關問題