2015-03-31 98 views
4

我看下面這個例子中的代碼,R元素頻率和索引,居

r element frequency and column name

,想知道是否有任何的方式來顯示在每個列中的每個元素的索引,除了r中的等級和頻率。因此,例如,所需的輸入和輸出是

df <- read.table(header=T, text='A B C D 
a a b c 
b c x e 
c d y a 
d NA NA  z 
e NA NA NA 
f NA NA NA',stringsAsFactors=F) 

和輸出

element frequency columns ranking A B C D 
1  a   3 A,B,D  1 1 1 na 2 
3  c   3 A,B,D  1 3 2 na 1 
2  b   2  A,C  2 2 na 1 na 
4  d   2  A,B  2 4 3 na na 
5  e   2  A,D  2 5 na na 2 
6  f   1  A  3 6 na na na 
8  x   1  C  3 na na 2 na 
9  y   1  C  3 na na 3 na 
10  z   1  D  3 na na na 3 

謝謝。

+0

我認爲你的一些值在示例輸出中是不正確的。 – A5C1D2H2I1M1N2O1R2T1 2015-03-31 06:33:03

回答

2

也許有一種方法可以在一個步驟中做到這一點,但目前還沒有想到。所以,繼續my previous answer

library(dplyr) 
library(tidyr) 

step1 <- df %>% 
    gather(var, val, everything()) %>%    ## Make a long dataset 
    na.omit %>%         ## We don't need the NA values 
    group_by(val) %>%        ## All calculations grouped by val 
    summarise(column = toString(var),    ## This collapses 
      freq = n()) %>%      ## This counts 
    mutate(ranking = dense_rank(desc(freq)))  ## This ranks 

step2 <- df %>% 
    mutate(ind = 1:nrow(df)) %>%     ## Add an indicator column 
    gather(var, val, -ind) %>%      ## Go long 
    na.omit %>%         ## Remove NA 
    spread(var, ind)        ## Go wide 

inner_join(step1, step2) 
# Joining by: "val" 
# Source: local data frame [9 x 8] 
# 
# val column freq ranking A B C D 
# 1 a A, B, D 3  1 1 1 NA 3 
# 2 b A, C 2  2 2 NA 1 NA 
# 3 c A, B, D 3  1 3 2 NA 1 
# 4 d A, B 2  2 4 3 NA NA 
# 5 e A, D 2  2 5 NA NA 2 
# 6 f  A 1  3 6 NA NA NA 
# 7 x  C 1  3 NA NA 2 NA 
# 8 y  C 1  3 NA NA 3 NA 
# 9 z  D 1  3 NA NA NA 4