2017-10-15 18 views
0

請原諒我,我對此很新。如果有人可以幫助或指向我的資源來幫助,我會最感激:在R中執行許多列的自動化功能

我有一個數據表,有300,000個變量,一些結果/症狀(因變量)和一些輸入(自變量)。對於每個症狀,我需要描述性統計,以及關於每個輸入的關聯的卡方檢驗結果。

對於描述性統計,我已經設法通過製作稱爲「症狀矩陣」的結果變量矩陣並使用「應用」來完成此操作。

Desc.stats<-matrix(c(apply(symptom.matrix,2,sum), 
        apply(symptom.matrix,2,mean), 
        apply(symptom.matrix,2,function(x) 
          {return(sqrt((mean(x)*(1-mean(x)))/length(x)))})), 
        ncol=3,         
        dimnames=list(c(...), 
        c("N","prev","s.e."))); Desc.stats 

要獲得卡方,我用個人對以下列方式結果和輸入的chisq.test,但我不能看到如何將其應用到symptom.matrix

result1<-(chisq.test(symptom1,input1)); 
print (c(result1$statistic, result1$p.value)) 

如何擴展此症狀以適應症狀?是否有可能使用chisq.test,還是我最好回到基礎編寫統計函數本身?

+0

請說明如何*症狀*和*輸入*在數據表中標識。它們是前綴/後綴?可能甚至會顯示原始數據集或發佈給我們運行:'dput(head(mydatatable))' – Parfait

+0

_symptoms_和_input_在數據表中沒有標識。所以我從數據表中調用症狀:'symptom.matrix <-with(mydatatable,matrix(c(Vision,Voice,Del,Parania,...),ncol = 8))' –

+0

*? – Parfait

回答

0

請考慮嵌套調用lapply遍歷每個症狀跨越輸入列的每個組合與嵌套列表的回報。並且輸入對象爲lapply將是所有症狀列的分割和所有輸入列與原始數據幀的分割。

由於運不提供實際數據的樣本,下面用隨機數據表明:

set.seed(788) 
symptoms <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26) 
colnames(symptoms) <- c("Vision.Symptom","Voice.Symptom","Delofreference.Symptom","Paranoia.Symptom", 
         "VisionorVoice.Symptom","Delusion.Symptom","UEAny.Symptom") 

set.seed(992) 
inputs <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26) 
colnames(inputs) <- c("Vision.Input","Voice.Input","Delofreference.Input","Paranoia.Input", 
         "VisionorVoice.Input","Delusion.Input","UEAny.Input") 

df <- data.frame(symptoms, inputs) 

# LIST OF 7 ITEMS, EACH NESTED WITH THE 7 INPUTS 
# CHANGE grep() to c() OF ACTUAL COLUMN NAMES 
chi_sq_list <- lapply(df[grep("\\.Symptom", names(df))], function(s) 
         lapply(df[grep("\\.Input", names(df))], function(i) chisq.test(s,i))) 

輸出(第一個列表項的)

chi_sq_list$Vision.Symptom 

$Vision.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 241.22, df = 240, p-value = 0.4657 


$Voice.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 247, df = 240, p-value = 0.3644 


$Delofreference.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 289.25, df = 256, p-value = 0.07502 


$Paranoia.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 322.11, df = 288, p-value = 0.08131 


$VisionorVoice.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 215.22, df = 208, p-value = 0.351 


$Delusion.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 218.47, df = 224, p-value = 0.5916 


$UEAny.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 254.22, df = 256, p-value = 0.5196