2017-07-14 33 views
0

我正在尋找一種基於兩個值從R數據幀中獲取頻率計數的方法。我已經嘗試了一些不同的語法和我在R.如何獲得R中兩個變量的頻率計數?

> table(frequency.data.frame$value,frequency.data.frame$value_x)[!is.na(frequency.data.frame$id),] 
Error in `[.default`(table(frequency.data.frame$value, frequency.data.frame$value_x), : 
    (subscript) logical subscript too long 
> table(frequency.data.frame$value,frequency.data.frame$value_x[!is.na(frequency.data.frame$id),]) 
Error in frequency.data.frame$value_x[!is.na(frequency.data.frame$id), : 
    incorrect number of dimensions 

是相當新鑑於

第一個維度。

as.data.frame(table(frequency.data.frame[!is.na(frequency.data.frame$id),]$value)) 
    Var1 Freq 
1  2 2 
2  3 2 
3  4 5 
4  5 21 
5  6 8 
6  7 19 
7  8 52 
8  9 33 
9 10 56 
10 11 1 
11 12 1 

第二維。

as.data.frame(table(frequency.data.frame[!is.na(frequency.data.frame$id),]$value_x)) 
    Var1 Freq 
1  1 50 
2  2 17 
3  3 12 
4  4 7 
5  6 18 
6  8 6 
7  9 1 
8 10 19 
9 14 1 
10 15 1 
11 16 11 
12 17 2 
13 18 2 
14 96 3 
15 97 4 
16 98 46 

數據幀的樣本數據提取...

> frequency.data.frame 
            id name               factor value value_x 
1        <NA>          OSuppl=1 - Ardex | Imp_1=1 - 1  1  1 
2        <NA>          OSuppl=1 - Ardex | Imp_1=2 - 2  2  1 
3 e7f0940c64001d4ab9d43ebd1e361292          OSuppl=1 - Ardex | Imp_1=3 - 3  3  1 
4        <NA>          OSuppl=1 - Ardex | Imp_1=4 - 4  4  1 
5 2de771a03f49ce72eb721159933d4827          OSuppl=1 - Ardex | Imp_1=5 - 5  5  1 
6 307ad612c3cc9fe5741c1fe75d1bc217          OSuppl=1 - Ardex | Imp_1=5 - 5  5  1 
7 522f594612678f13f9dd5ee8f4f24df7          OSuppl=1 - Ardex | Imp_1=5 - 5  5  1 
8 c1c32ac37f572fb259fe4e454bbdf743          OSuppl=1 - Ardex | Imp_1=5 - 5  5  1 
9 d5b784d8f9508da7ac9573b535fe7147          OSuppl=1 - Ardex | Imp_1=5 - 5  5  1 
10 e07439cdc15377d209413b31d9f80056          OSuppl=1 - Ardex | Imp_1=6 - 6  6  1 
11 878a67dbbb428c65c83602fc112a24a0          OSuppl=1 - Ardex | Imp_1=6 - 6  6  1 
12 5f7c27fb104685c26e53fc3267024539          OSuppl=1 - Ardex | Imp_1=7 - 7  7  1 
13 6b12a3591d89f7b70587406a0c4f92bb          OSuppl=1 - Ardex | Imp_1=7 - 7  7  1 
14 7fb2f98867e0e100187f0b4f13baac46          OSuppl=1 - Ardex | Imp_1=7 - 7  7  1 
15 99a0ffaa2066e5c4806f2e30a446a31f          OSuppl=1 - Ardex | Imp_1=7 - 7  7  1 
16 9d214544e8eaf3ea9c416a3dfbddb9f6          OSuppl=1 - Ardex | Imp_1=7 - 7  7  1 
17 b36f990b1e0d8c5f04a47d23b70c1022          OSuppl=1 - Ardex | Imp_1=7 - 7  7  1 
18 f2f9395bd9ddc16acd2253bd114aca64          OSuppl=1 - Ardex | Imp_1=7 - 7  7  1 
19 4420e8499ab32631b389111935314468          OSuppl=1 - Ardex | Imp_1=8 - 8  8  1 
... 

期望的結果提取物例如

Var2 Var1 Freq 
... 
6  5 1 5 
7  6 1 2 
8  7 1 7 
9  8 1 1 
... 

我需要什麼樣的語法來獲得例如所需的輸出?

+0

你可以先'過濾''id',然後做一個'表'即as.data.frame(表(subset(frequency.data.frame,select = c('value','value_x')) ,!is.na(id))))' – akrun

+1

非常感謝@akrun。這樣可行。你想把它寫成答案嗎? –

回答

1

因爲我們只得到基於非NA元素基礎上,非NA「身份證」值',「value_x」,subset的頻率,select的利益列,得到table並轉換爲data.frame

as.data.frame(table(subset(frequency.data.frame, 
      select = c('value', 'value_x'), !is.na(id)))) 

tidyverse語法對於上述的解決辦法是

library(dplyr) 
frequency.data.frame %>% 
     filter(!is.na(id)) %>% 
     count(var1 = value, var2 = value_x) 
1
library(plyr) 
counts <- ddply(frequency.data.frame, .(frequency.data.frame$value_x, frequency.data.frame$value), nrow) 
names(counts) <- c("value_x", "value", "Freq") 

     value_x value Freq 
    1   1  1 1 
    2   1  2 1 
    3   1  3 1 
    4   1  4 1 
    5   1  5 5 
    6   1  6 2 
    7   1  7 7 
    8   1  8 10 
    9   1  9 9 
    10  1 10 15 
    11  1 11 1 
    12  1 12 1 
    13  2  1 1 
    ... 
相關問題