在R中執行許多列的自動化功能

請原諒我，我對此很新。如果有人可以幫助或指向我的資源來幫助，我會最感激：在R中執行許多列的自動化功能

我有一個數據表，有300,000個變量，一些結果/症狀（因變量）和一些輸入（自變量）。對於每個症狀，我需要描述性統計，以及關於每個輸入的關聯的卡方檢驗結果。

對於描述性統計，我已經設法通過製作稱爲「症狀矩陣」的結果變量矩陣並使用「應用」來完成此操作。

Desc.stats<-matrix(c(apply(symptom.matrix,2,sum), 
        apply(symptom.matrix,2,mean), 
        apply(symptom.matrix,2,function(x) 
          {return(sqrt((mean(x)*(1-mean(x)))/length(x)))})), 
        ncol=3,         
        dimnames=list(c(...), 
        c("N","prev","s.e."))); Desc.stats

要獲得卡方，我用個人對以下列方式結果和輸入的chisq.test，但我不能看到如何將其應用到symptom.matrix

result1<-(chisq.test(symptom1,input1)); 
print (c(result1$statistic, result1$p.value))

如何擴展此症狀以適應症狀？是否有可能使用chisq.test，還是我最好回到基礎編寫統計函數本身？

來源

2017-10-15 Katrina Davis

請說明如何*症狀*和*輸入*在數據表中標識。它們是前綴/後綴？可能甚至會顯示原始數據集或發佈給我們運行：'dput（head（mydatatable））' – Parfait

_symptoms_和_input_在數據表中沒有標識。所以我從數據表中調用症狀：'symptom.matrix <-with（mydatatable，matrix（c（Vision，Voice，Del，Parania，...），ncol = 8））' –

*？ – Parfait

請考慮嵌套調用lapply遍歷每個症狀跨越輸入列的每個組合與嵌套列表的回報。並且輸入對象爲lapply將是所有症狀列的分割和所有輸入列與原始數據幀的分割。

由於運不提供實際數據的樣本，下面用隨機數據表明：

set.seed(788) 
symptoms <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26) 
colnames(symptoms) <- c("Vision.Symptom","Voice.Symptom","Delofreference.Symptom","Paranoia.Symptom", 
         "VisionorVoice.Symptom","Delusion.Symptom","UEAny.Symptom") 

set.seed(992) 
inputs <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26) 
colnames(inputs) <- c("Vision.Input","Voice.Input","Delofreference.Input","Paranoia.Input", 
         "VisionorVoice.Input","Delusion.Input","UEAny.Input") 

df <- data.frame(symptoms, inputs) 

# LIST OF 7 ITEMS, EACH NESTED WITH THE 7 INPUTS 
# CHANGE grep() to c() OF ACTUAL COLUMN NAMES 
chi_sq_list <- lapply(df[grep("\\.Symptom", names(df))], function(s) 
         lapply(df[grep("\\.Input", names(df))], function(i) chisq.test(s,i)))

輸出（第一個列表項的）

chi_sq_list$Vision.Symptom 

$Vision.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 241.22, df = 240, p-value = 0.4657 


$Voice.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 247, df = 240, p-value = 0.3644 


$Delofreference.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 289.25, df = 256, p-value = 0.07502 


$Paranoia.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 322.11, df = 288, p-value = 0.08131 


$VisionorVoice.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 215.22, df = 208, p-value = 0.351 


$Delusion.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 218.47, df = 224, p-value = 0.5916 


$UEAny.Input 

    Pearson's Chi-squared test 

data: s and i 
X-squared = 254.22, df = 256, p-value = 0.5196

來源

2017-10-15 15:51:34 Parfait

在R中執行許多列的自動化功能

回答

相關問題