2017-08-08 55 views
2

我有一個看起來像這樣的數據:具有廣泛的數據的數據幀上卡方檢驗

ID gamesAlone gamesWithOthers gamesRemotely tvAlone tvWithOthers tvRemotely 
1 1             1 
2        1      1 
3        1    1 
4        1    1 
5        1      1 
6        1    1 
7        1    1 
8    1          1 
9 1                 1 

我想代碼,可以做以下兩件事情:

首先,變換這像這樣整齊的列聯表:

 Alone WithOthers Remotely 
games 2  1   6 
tv  4  4   1 

其次,使用卡方,看看這些活動(遊戲v電視)在他們的社會背景不同。

這是代碼來生成數據幀:

data<-data.frame(ID=c(1,2,3,4,5,6,7,8,9), 
      gamesAlone=c(1,NA,NA,NA,NA,NA,NA,NA,1), 
      gamesWithOthers=c(NA,NA,NA,NA,NA,NA,NA,1,NA), 
      gamesRemotely=c(NA,1,1,1,1,1,1,NA,NA), 
      tvAlone=c(NA,NA,1,1,NA,1,1,NA,NA), 
      tvWithOthers=c(1,1,NA,NA,1,NA,NA,1,NA), 
      tvRemotely=c(NA,NA,NA,NA,NA,NA,NA,NA,1)) 

回答

2

略去第一列ID[-1]),然後取每個列的總和(colSums),而除去NA值(na.rm=TRUE),並將得到的長度爲6的矢量放入具有2行的矩陣中。如果需要,還可以相應地標註矩陣尺寸(參數爲dimnames):

m <- matrix(
    colSums(data[-1], na.rm=T), 
    nrow=2, byrow=T, 
    dimnames = list(c("games", "tv"), c("alone", "withOthers", "remotely")) 
) 
m 
#  alone withOthers remotely 
# games  2   1  6 
# tv  4   4  1 
chisq.test(m) 
# 
# Pearson's Chi-squared test 
# 
# data: m 
# X-squared = 6.0381, df = 2, p-value = 0.04885 
0

這將讓你在應急表中,你給的形式。建議:請撥打data1而不是data以避免混淆。

library(dplyr) 
library(tidyr) 
data1_table <- data1 %>% 
    gather(key, value, -ID) %>% 
    mutate(activity = ifelse(grepl("^tv", key), substring(key, 1, 2), substring(key, 1, 5)), 
     context = ifelse(grepl("^tv", key), substring(key, 3), substring(key, 6))) %>% 
    group_by(activity, context) %>% 
    summarise(n = sum(value, na.rm = TRUE)) %>% 
    ungroup() %>% 
    spread(context, n) 

# A tibble: 2 x 4 
    activity Alone Remotely WithOthers 
* <chr> <dbl> <dbl>  <dbl> 
1 games  2  6   1 
2  tv  4  1   4 

對於卡方:它取決於您想要比較的內容,我假設您的實際數據具有更高的計數。你可以管一大堆進入chisq.test這樣的,但我不認爲這是非常豐富:

data1_table %>% 
    select(2:4) %>% 
    chisq.test()