2015-03-31 46 views
0

我正在尋找使用計算值填充新數據框列,該計算值對每個數據子組都是唯一的。這裏是我的確切代碼:用新值在df中填充新列

df <- read.csv('data_30_Mar2015.csv') 


df$dCT <- NA 

#FUNCTION 
calc_dCT <- function(sample, DF){ 

sample_df <- DF[ which(DF$Sample=='sample'),] 
print (sample_df) 
VIC <- sample_df[ which(sample_df$Reporter=='VIC'),] 
FAM <- sample_df[ which(sample_df$Reporter=='FAM'),] 

VIC_mean<-mean(VIC[,3]) 
FAM_mean<-mean(FAM[,3]) 

DCT <- FAM_mean - VIC_mean 

for (i in 1:length(sample_df)){ 
    sample_df[i,4] <- DCT 
    } 
DF<-merge(DF, sample_df, all=TRUE) 
} 

#CALLS TO FUNCTION 
calc_dCT('c48', df) 
calc_dCT('m48', df) 
calc_dCT('c72', df) 
calc_dCT('m72', df) 

print (df) 

這裏是輸出:

calc_dCT('c48', df) 
[1] Sample Reporter CT  dCT  
<0 rows> (or 0-length row.names) 
calc_dCT('m48', df) 
[1] Sample Reporter CT  dCT  
<0 rows> (or 0-length row.names) 
calc_dCT('c72', df) 
[1] Sample Reporter CT  dCT  
<0 rows> (or 0-length row.names) 
calc_dCT('m72', df) 
[1] Sample Reporter CT  dCT  
<0 rows> (or 0-length row.names) 

print (df) 
Sample Reporter  CT dCT 
1  m48  VIC 27.50595 NA 
2  m48  VIC 27.77835 NA 
3  m48  VIC 27.62321 NA 
4  m48  FAM 30.87295 NA 
5  m48  FAM 30.87967 NA 
6  m48  FAM 30.73427 NA 
7  c48  VIC 26.56715 NA 
8  c48  VIC 26.89787 NA 
9  c48  VIC 26.82587 NA 
10 c48  FAM 30.20642 NA 
11 c48  FAM 30.43074 NA 
12 c48  FAM 30.36933 NA 
13 m72  VIC 29.61585 NA 
14 m72  VIC 28.65742 NA 
15 m72  VIC 29.40057 NA 
16 m72  FAM 32.27304 NA 
17 m72  FAM 32.38696 NA 
18 m72  FAM 32.24386 NA 
19 c72  VIC 28.22370 NA 
20 c72  VIC 28.17342 NA 
21 c72  VIC 28.49104 NA 
22 c72  FAM 31.91751 NA 
23 c72  FAM 31.67524 NA 
24 c72  FAM 31.87287 NA 

它似乎並沒有被正確子集劃分的數據,我不知道爲什麼會。我試圖用DCT的計算值填充'dCT'列。

+2

你能否用語言什麼是你想達到解釋?什麼是DCT?你爲什麼運行'DF $ Sample =='sample'',其中'DF $ Sample'中的值不等於'sample'?你想要的輸出是什麼? – 2015-03-31 10:31:17

+0

如果您查看df,例如在樣本'm48'中:DCT = FAM的平均值 - VIC的平均值。我想把這個意思加到'm48'的每一行上。然後,我想重複'c48'等過程。DF $ Sample == sample,其中sample是一個提供給函數的變量,感謝您找到'sample',它應該只是樣本而沒有任何語言符號。但是仍然沒有計算VIC的平均值 - FAM的平均值並附加到df。 – user3062260 2015-03-31 10:43:09

+0

請記得總是發佈可複製的數據,例如使用dput或類似的東西。見http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – jhin 2015-03-31 12:09:45

回答

2

下面是使用data.table一個可能的解決方案(假設你沒有dCT列)

library(data.table) 
setDT(df)[, dCT := mean(CT[Reporter=='FAM']) - mean(CT[Reporter=='VIC']), by = Sample][] 
# Sample Reporter  CT  dCT 
# 1: m48  VIC 27.50595 3.193127 
# 2: m48  VIC 27.77835 3.193127 
# 3: m48  VIC 27.62321 3.193127 
# 4: m48  FAM 30.87295 3.193127 
# 5: m48  FAM 30.87967 3.193127 
# 6: m48  FAM 30.73427 3.193127 
# 7: c48  VIC 26.56715 3.571867 
# 8: c48  VIC 26.89787 3.571867 
... 
0

同樣的事情可以明顯dplyr來完成,所以我想我會添加另一個版本。

df <- data.frame(Sample = c(rep("m48", 6), rep("c48", 6)), Reporter = c(rep("VIC", 3), rep("FAM", 3), rep("VIC", 3), rep("FAM", 3)), CT = c(27.50595, 27.77835, 27.62321, 30.87295, 30.87967, 30.73427, 26.56715, 26.89787, 26.82587, 30.20642, 30.43074, 30.36933)) 

library(dplyr) 
df %>% group_by(Sample) %>% 
    mutate(dCT = mean(CT[Reporter == 'FAM']) - mean(CT[Reporter == 'VIC'])) 
# Source: local data frame [12 x 4] 
# Groups: Sample 
# 
# Sample Reporter  CT  dCT 
# 1  m48  VIC 27.50595 3.193127 
# 2  m48  VIC 27.77835 3.193127 
# 3  m48  VIC 27.62321 3.193127 
# 4  m48  FAM 30.87295 3.193127 
# 5  m48  FAM 30.87967 3.193127 
# 6  m48  FAM 30.73427 3.193127 
# 7  c48  VIC 26.56715 3.571867 
# 8  c48  VIC 26.89787 3.571867 
# 9  c48  VIC 26.82587 3.571867 
# 10 c48  FAM 30.20642 3.571867 
# 11 c48  FAM 30.43074 3.571867 
# 12 c48  FAM 30.36933 3.571867 

只是因爲我知道它是不是令人滿意的接收響應,指出「你做什麼不好,寧願做」 - 這裏有什麼不一起工作的一些注意事項你的原始代碼。 但請注意,我仍然推薦其他解決方案之一。

  1. R按值傳遞函數參數,而不是通過引用。這意味着 ,您不能在函數內部更改數據框df,因爲您只處理副本。您寧願返回 結果,然後使用此結果修改df。
  2. length(dataframe)不會做你認爲它所做的事情:它返回的是列數,而不是行數。你想要的是nrow(dataframe)
  3. 爲數據框中列的每個元素分配單個consant值不需要循環;只需分配值,R將自動擴展。

所以這裏有一個版本的代碼,工程:

calc_dCT <- function(sample, DF){ 

    sample_df <- DF[ which(DF$Sample==sample),] 
    VIC <- sample_df[ which(sample_df$Reporter=='VIC'),] 
    FAM <- sample_df[ which(sample_df$Reporter=='FAM'),] 

    VIC_mean<-mean(VIC[,3]) 
    FAM_mean<-mean(FAM[,3]) 

    DCT <- FAM_mean - VIC_mean 

    sample_df$dCT <- DCT 

    sample_df 
} 

dfnew <- data.frame(Sample=character(), Reporter=character(), CT=numeric(), dCT=numeric()) 
for (sample_name in unique(df$Sample)) 
    dfnew <- rbind(dfnew, calc_dCT(sample_name, df)) 
相關問題