總結和R中聚集的列值作爲行

我的數據幀主要包含catagorical列和一個數值列，對DF看起來像這樣（簡化）：總結和R中聚集的列值作爲行

**Home_type**  **Garden_type**  **NaighbourhoOd** **Rent** 
Vila    big     brooklyn    5000 
Vila    small    bronx    7000 
Condo    shared    Sillicon valley  2000 
Appartment   none     brooklyn    500 
Condo    none     bronx    1700 
Appartment   none     Sillicon Valley  800

對於每個catagorical柱，我想顯示所有其獨特的價值，頻率和與之相關的租金總和。

結果應該是這樣的：

**Variable**  **Distinct_values**  **No_of-Occurences**  **SUM_RENT** 
    Home_type  Vila      2      12000 
    Home_type  Condo     2      3700 
    Home_type  Appartment    2      1300 
    Garden_type  big      1      5000 
    Garden_type  small     1      7000 
    Garden_type  shared     1      2000 
    Garden_type  none      3      3000 
    Naighbourhood brooklyn     2      5500 
    Naighbourhood Bronx     2      8700 
    Naighbourhood Sillicon Valley   2      2800

我是新來的R和試圖做到這一點使用熔體reshape2但還沒有做出什麼成績，任何幫助將非常感激。

來源

2016-11-04 Ali Zia

你可能想看看[這個概述詢問好R問題]（http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-例如），特別是在那些可以輕鬆讀取數據的部分。如果我們不必爲了將數據讀入R而更容易提供幫助 –

感謝您將它指向我Mark，我一定會在將來更加小心，稍後將編輯這篇文章。 –

我傾向於選擇tidyr到後期reshape2，雖然這主要是因爲語法更類似於dplyr - 這將使得這一任務更加容易，以及由於裝載magrittr管（%>%）和它的數據彙總工具。

首先，我們將gather（from tidyr）的所有非Rent列轉換爲長格式（只運行這兩行來查看結果）。然後group_by你想聚集在一起的列。最後，在每個組中獲取summarise以獲取所需的指標。

df %>% 
    gather(Variable, Distinct_Values, -Rent) %>% 
    group_by(Variable, Distinct_Values) %>% 
    summarise(
    `No_of-Occurences` = n() 
    , SUM_RENT = sum(Rent) 
)

給出：

 Variable Distinct_Values `No_of-Occurences` SUM_RENT 
      <chr>   <chr>    <int> <int> 
1 Garden_type    big     1  5000 
2 Garden_type   none     3  3000 
3 Garden_type   shared     1  2000 
4 Garden_type   small     1  7000 
5  Home_type  Appartment     2  1300 
6  Home_type   Condo     2  3700 
7  Home_type   Vila     2 12000 
8 NaighbourhoOd   bronx     2  8700 
9 NaighbourhoOd  brooklyn     2  5500 
10 NaighbourhoOd Sillicon valley     1  2000 
11 NaighbourhoOd Sillicon Valley     1  800

（注意，你的數據有「V」和「V」的「硅谷」導致兩條獨立的線路。）

來源

2016-11-04 12:36:29

很好，謝謝。我沒有意識到tidyr，一定會使用更多。 –

我們可以使用data.table。我們將'data.frame'轉換爲'data.table'（setDT(df1)），melt從'wide'到'long'格式，按'variable'，'value'（從melt創建的列） 'Rent'列的行數（.N）和sum，然後按'變量'，'No_of_cur'和'SUM_RENT'進行分組，得到'值'列的unique元素（'Distinct_values '）

library(data.table) 
melt(setDT(df1), id.var=c('Rent'))[, c("No_of_occur", "SUM_RENT") := 
     .(.N, sum(Rent)) ,.(variable, value)][, 
    .(Distinct_values = unique(value)) , .(variable, No_of_occur, SUM_RENT)] 
#   variable No_of_occur SUM_RENT Distinct_values 
#1:  Home_type   2 12000   Vila 
#2:  Home_type   2  3700   Condo 
#3:  Home_type   2  1300  Appartment 
#4: Garden_type   1  5000    big 
#5: Garden_type   1  7000   small 
#6: Garden_type   1  2000   shared 
#7: Garden_type   3  3000   none 
#8: NaighbourhoOd   2  5500  brooklyn 
#9: NaighbourhoOd   2  8700   bronx 
#10:NaighbourhoOd   2  2800 Sillicon Valley

來源

2016-11-04 12:38:04 akrun

總結和R中聚集的列值作爲行

回答

相關問題