1
我在R中有一個數據框,其中每行是一個個體,每列是一個疾病代碼。每個細胞含有1或0來表示個體是否患有該疾病。對於每個疾病編碼X,我想將患有疾病X的個體與沒有疾病X的個體分開。然後,我想計算患有疾病X的患者也患有疾病Y或疾病Z的相對風險。下面是樣本數據而我的方法:如何計算R中矩陣中所有條件對的相對風險?
# generate reproducible dataframe with disease diagnoses
set.seed(2)
ID = c(0:19)
disease0 = c(rbinom(10, 1, 0.0), rbinom(10, 1, 1.0))
disease1 = c(rbinom(10, 1, 0.1), rbinom(10, 1, 0.9))
disease2 = c(rbinom(10, 1, 0.5), rbinom(10, 1, 0.5))
disease3 = c(rbinom(10, 1, 0.9), rbinom(10, 1, 0.1))
disease4 = c(rbinom(10, 1, 1.0), rbinom(10, 1, 0.0))
(disease.df = data.frame(cbind(ID, disease0, disease1, disease2, disease3, disease4)))
row.names(disease.df) = disease.df[ ,1]
disease.df[ ,1] = NULL
disease.df
disease0 disease1 disease2 disease3 disease4
0 0 0 1 0 1
1 0 0 0 1 1
2 0 0 1 1 1
3 0 0 0 1 1
4 0 1 0 0 1
5 0 1 0 1 1
6 0 0 0 0 1
7 0 0 0 1 1
8 0 0 1 1 1
9 0 0 0 1 1
10 1 1 0 0 0
11 1 1 0 0 0
12 1 1 1 0 0
13 1 1 1 1 0
14 1 1 1 0 0
15 1 1 1 0 0
16 1 0 1 0 0
17 1 1 0 1 0
18 1 1 1 0 0
19 1 1 0 0 0
我可以使用下面的代碼來計算相對風險與疾病0個人也通過4
colMeans(filter(disease.df, disease0 == 1))/colMeans(filter(disease.df, disease0 != 1))
disease0 disease1 disease2 disease3 disease4
Inf 4.5000000 2.0000000 0.2857143 0.0000000
我的問題是有疾病1,有沒有辦法使用矢量化操作或應用函數爲所有5種疾病做這件事,同時避免for循環。理想情況下,希望產生一個像這樣的表:
disease0 disease1 disease2 disease3 disease4
diease0 Inf 4.5000000 2.0000000 0.2857143 0.0000000
diease1 7.3636364 Inf 1.0227273 0.4090909 0.2045455
diease2 1.8333333 1.0185185 Inf 0.6111111 0.5238095
diease3 0.3055556 0.4583333 0.6111111 Inf 2.8518519
diease4 0.0000000 0.2222222 0.5000000 3.5000000 Inf
在看到以下Ronak的評論之前,我保存了上述編輯內容。對不起,有任何困惑。 – Josh