計數標籤並創建中的R彙總表

下面是類似於我的數據集物品的一部分：計數標籤並創建中的R彙總表

require(dplyr) 
alldata 
site date percent_rank Label 
01A 2013-01-01 0.32   Normal 
01B 2013-01-01 0.12   Low 
01C 2013-01-01 0.76   High 
02A 2013-01-01  0   N/A 
02B 2013-01-01 0.16   Low 
02C 2013-01-01 0.5   Normal 
01A 2013-01-02 0.67   Normal 
01B 2013-01-02 0.01   Low 
01C 2013-01-02 0.92   High

我分配的每個PERCENT_RANK基於所述值（0至0.25至0.75至1的標籤三類）。我現在想生產這種格式的彙總表：

site Low Normal High Missing 
01A 32 47  92 194 
01B 232 23  17 93 
01C 82 265  12 6

，其中每個站點將有低，中，高值的出現與該網站的標籤的所有日期的計數（有一個每年的每一天），並且N/A值將被計算爲「缺失」列。

我曾嘗試以下：

alldata <- %>% group_by(site) %>% mutate(length(Label == "Low"))

返回的所有記錄的總價值，而不是每個網站「低」的計數，並

alldata <- %>% group_by(site) %>% mutate(length(which(Label == "Low")))

返回幾個值比記錄總數高出一千個。我的想法是，我會重複這個功能，創建四個新的列和四個單獨的mutate行（每個類別一個），這將產生我的彙總表。我也嘗試過一些aggregate（）的變體，儘管函數組件對我的目標不太清楚。這看起來應該是一個非常簡單的事情（並且group_by很好地爲我計算了百分比排名和相關標籤），但我還沒有找到解決方案。任何提示都非常感謝！

來源

2016-06-21 acersaccharum

'dplyr'包中有'count'函數。也許這是有幫助的。 – user2100721

如果你使用'which'長度就足夠了，但是使用邏輯向量，'sum'會給出計數。 – akrun

我們可以使用dcastdata.table，它也有fun.aggregate，速度非常快。

library(data.table) 
dcast(setDT(alldata), site~Label, length)

或者用dplyr/tidyr

library(dplyr) 
library(tidyr) 
alldata %>% 
    group_by(site, Label) %>% 
    tally() %>% 
    spread(Label, n)

一個base R選擇是

reshape(aggregate(date~site + Label, alldata, length), 
      idvar = "site", timevar="Label", direction="wide")

來源

2016-06-21 18:32:39 akrun

這是完美的！我熟悉'dplyr'和'tidyr'軟件包，所以它與我的其他代碼格式很好地保持一致。謝謝@akrun，以及所有的快速回應。 – acersaccharum

有三種方式dplyr做到這一點。首先是最詳細和其他兩個使用的便利功能，縮短了代碼：

library(reshape2) 
library(dplyr) 

alldata %>% group_by(site, Label) %>% summarise(n=n()) %>% dcast(site ~ Label) 

alldata %>% group_by(site, Label) %>% tally %>% dcast(site ~ Label) 

alldata %>% count(site, Label) %>% dcast(site ~ Label)

來源

2016-06-21 18:35:50 eipi10

爲了剛剛產生的彙總表，你可以使用table：

with(df, table(site, Label, useNA="ifany"))[, c(2,4,1,3)] 

    Label 
site Low Normal High N/A 
    01A 0  2 0 0 
    01B 2  0 0 0 
    01C 0  0 2 0 
    02A 0  0 0 1 
    02B 1  0 0 0 
    02C 0  1 0 0

數據

df <- read.table(header=T, text="site date percent_rank Label 
01A 2013-01-01 0.32   Normal 
01B 2013-01-01 0.12   Low 
01C 2013-01-01 0.76   High 
02A 2013-01-01  0   N/A 
02B 2013-01-01 0.16   Low 
02C 2013-01-01 0.5   Normal 
01A 2013-01-02 0.67   Normal 
01B 2013-01-02 0.01   Low 
01C 2013-01-02 0.92   High")

來源

2016-06-21 18:39:49 lmo

計數標籤並創建中的R彙總表

回答

相關問題