R：組由多個列和計算

我有以下的數據幀，df：R：組由多個列和計算

LeftOrRight SpeedCategory NumThruLanes 
R   25to45   3    
L   45to62   2   
R   Gt62   1

我想通過SpeedCategory組，並依次通過其他列得到的每個唯一代碼的頻率在每個速度類別 - 是這樣的：

    25to45 45to62 Gt62 
LeftOrRight L  0  1 0 
       R  1  0 1 
NumThruLanes 1  0  0 1 
       2  0  1 0 
       3  1  0 0

最近我已經能夠來是這樣的：

for (col in df){ 
tbl <- table(col, df$SpeedCategory) 
print(tbl) 
}

打印出以下（第一SpeedCategory，然後NumThruLanes）：

col 25to45 45to62 Gt62 
    L  0  1 0 
    R  1  0 1 

col 25to45 45to62 Gt62 
    1  0  0 1 
    2  0  1 0 
    3  1  0 0

我敢肯定，我可以完成我的目標與aggregate()或從dplyr也許GROUP_BY，但我是新來的R和想不通出語法。在pandas我會使用MultiIndex，但我不知道R等價物是什麼，所以很難谷歌。

我想嘗試通過一個循環或循環來完成所有任務，因爲我有十幾個要通過的列。

來源

2016-12-23 ale19

的tables包可以很容易地格式化表格以特定的方式。語法需要一些時間來適應，但對於這個問題，這是很直接：

exd <- read.table(text = "LeftOrRight SpeedCategory NumThruLanes 
R   25to45   3    
L   45to62   2   
R   Gt62   1", header = TRUE)  

## to get counts by default we need everything to be categorical 
exd$SpeedCategory <- factor(exd$SpeedCategory) 

library(tables) 
tabular(LeftOrRight + NumThruLanes ~ SpeedCategory, data = exd) 

##    SpeedCategory    
##    25to45  45to62 Gt62 
## LeftOrRight L 0    1  0 
##    R 1    0  1 
## NumThruLanes 1 0    0  1 
##    2 0    1  0 
##    3 1    0  0

如果你有很多的列遍歷，您可以通過編程構建公式，例如，

tabular(as.formula(paste(paste(names(exd)[-2], collapse = " + "), 
         names(exd)[2], sep = " ~ ")), 
     data = exd)

作爲獎勵，有html和latex方法，可以很容易地標記您的表，以包括在文章或報告。

來源

2016-12-23 20:36:26 Ista

這正是我需要的，謝謝！最後，我不得不將所有的列轉換爲lapply（df，factor）的因子，並且在那之後它運行良好。 – ale19

在一個通這不會做的一切，但可能讓你在正確的方向

library(reshape2) 

dcast(df, LeftOrRight ~ SpeedCategory, fun.aggregate = length) 
dcast(df, NumThruLanes ~ SpeedCategory, fun.aggregate = length)

來源

2016-12-23 19:31:52 manotheshark

要與dcast從reshape2包你可以這樣做：

library("reshape2") 

DF=read.table(text="LeftOrRight SpeedCategory NumThruLanes 
R   25to45   3    
L   45to62   2   
R   Gt62   1",header=TRUE,stringsAsFactors=FALSE) 

LR_Stat = dcast(DF,LeftOrRight ~ SpeedCategory,length,fill=0) 
LR_Stat 
# LeftOrRight 25to45 45to62 Gt62 
#1   L  0  1 0 
#2   R  1  0 1 

Lanes_Stat = dcast(DF,NumThruLanes ~ SpeedCategory,length,fill=0) 
Lanes_Stat 
# NumThruLanes 25to45 45to62 Gt62 
#1   1  0  0 1 
#2   2  0  1 0 
#3   3  1  0 0

注意LR_Stat應在預期的輸出中有1到45to62的範圍

來源

2016-12-23 19:32:19 OdeToMyFiddle

修好了，謝謝！這有效，但我有很多列需要通過。有沒有辦法做到這一點，而不明確命名列？我嘗試循環和追加每個對象到一個空白的數據框，但這似乎並沒有工作... – ale19

您可以使用lapply()而不是for循環完成所有操作：

tab_list <- lapply(df[, -2], function(col) table(col, df$SpeedCategory)) 
tab_list 
## $LeftOrRight 
##  
## col 25to45 45to62 Gt62 
## L  0  1 0 
## R  1  0 1 
## 
## $NumThruLanes 
##  
## col 25to45 45to62 Gt62 
## 1  0  0 1 
## 2  0  1 0 
## 3  1  0 0

然後，您可以將表合併成使用rbind()與do.call()之一：

do.call(rbind, tab_list) 
## 25to45 45to62 Gt62 
## L  0  1 0 
## R  1  0 1 
## 1  0  0 1 
## 2  0  1 0 
## 3  1  0 0

這是可能得到的指示從原始數據幀列名的輸出表中的列。要做到這一點，你需要在lapply()一個較爲複雜的功能列名：

tab_list <- lapply(names(df)[-2], function(col) { 
    tab <- table(df[, col], df[, "SpeedCategory"]) 
    name_col <- c(col, rep("", nrow(tab) - 1)) 
    mat <- cbind(name_col, rownames(tab), tab) 
    as.data.frame(mat) 
    }) 
do.call(rbind, tab_list) 
##  name_col V2 25to45 45to62 Gt62 
## L LeftOrRight L  0  1 0 
## R    R  1  0 1 
## 1 NumThruLanes 1  0  0 1 
## 2    2  0  1 0 
## 3    3  1  0 0

來源

2016-12-23 19:34:09 Stibu

這看起來很有前途。有沒有辦法在do.call（）（除了手動添加一列之外）中保留行的每個細分的列名（LeftOrRight，NumThruLanes等），使它看起來更像我的期望輸出？ – ale19

R：組由多個列和計算

回答

相關問題