2013-08-31 101 views
1

我相信我只是犯了一個簡單的錯誤。我有一個大的矩陣3307592x9,我需要迭代,如果列8(字符串/字符串)== 9(字符/字符串)(不區分大小寫)然後列3-7(數字0-1)需要是1自我。我寫的代碼是:基於條件語句的矩陣元素的條件更新

for (i in 1:3307592){ 
    if(grepl(chr2SnpFreqNorm[i,8], chr2SnpFreqNorm[i,9], ignore.case=TRUE)){ 
     chr2SnpFreqNorm[i,3] <- 1 - chr2SnpFreqNorm[i,3] 
     chr2SnpFreqNorm[i,4] <- 1 - chr2SnpFreqNorm[i,4] 
     chr2SnpFreqNorm[i,5] <- 1 - chr2SnpFreqNorm[i,5] 
     chr2SnpFreqNorm[i,6] <- 1 - chr2SnpFreqNorm[i,6] 
     chr2SnpFreqNorm[i,7] <- 1 - chr2SnpFreqNorm[i,7] 
    } 
} 

當我嘗試執行我的[R客戶端只是掛了半小時以上我取消命令。我不確定我做錯了什麼,因爲代碼看起來對我來說是正確的。

/編輯 實例數據

> chr2SnpFreqNorm[1:10,] 
     ID pos ceuChr2SnpFreq chsChr2SnpFreq lwkChr2SnpFreq 
1 rs187078949 10133 0.070588235   0.000 0.030927835 
2 rs191522553 10140 0.005882353   0.000 0.005154639 
3 rs149483862 10286 0.100000000   0.135 0.226804124 
4 rs150919307 10297 0.147058824   0.070 0.113402062 
5 rs186644623 10315 0.000000000   0.000 0.000000000 
6 rs193294418 10345 0.017647059   0.000 0.036082474 
7 rs185496709 10386 0.082352941   0.020 0.087628866 
8 rs188771313 10419 0.229411765   0.085 0.056701031 
9 rs192945962 10425 0.100000000   0.020 0.015463918 
10 rs184397180 10431 0.064705882   0.005 0.036082474 
    tsiChr2SnpFreq yriChr2SnpFreq ALT AA 
1  0.035714286 0.045454545 A a 
2  0.005102041 0.005681818 A C 
3  0.239795918 0.170454545 A t 
4  0.168367347 0.130681818 T t 
5  0.000000000 0.005681818 G C 
6  0.030612245 0.028409091 A G 
7  0.035714286 0.113636364 T t 
8  0.147959184 0.090909091 G G 
9  0.091836735 0.034090909 G c 
10 0.015306122 0.045454545 T a 

> 
+0

你的主要錯誤是使用一個'for'循環,而不是一個量化的操作。 (我會建議爲你的數據大小包data.table。)我也不清楚,爲什麼你使用'grepl'。 'tolower'和'=='的組合應該就足夠了。如果您[給出了示例數據](http://stackoverflow.com/a/5963610/1412059),那麼向您展示如何執行此操作會更容易。 – Roland

+0

東西likefixAncestor < - 函數(X){ \t如果(tolower的(X [8])== tolower的(X [9])){ \t \t X [3] < - 1 - X [3] \t \t X [4] < - 1 - X [4] \t \t X [5] < - 1 - X [5] \t \t X [6] < - 1 - X [6] \t \t X [7] < - 1 - x [7] \t} } –

+0

添加樣本數據 –

回答

1

在基礎R你可以做簡單的

flip <- Vectorize(grepl)(chr2SnpFreqNorm[,8], chr2SnpFreqNorm[,9], ignore.case=TRUE) 

chr2SnpFreqNorm[flip,3:7] <- 1 - chr2SnpFreqNorm[filp,3:7] 

這可能是有點慢因爲Vectorize隱藏了一個循環。但是,如果你需要的是翻轉行,其中,列8和9準確匹配(除的情況下),然後使用此過濾器來代替:

flip <- tolower(chr2SnpFreqNorm[,8])==tolower(chr2SnpFreqNorm[,9]) 
1

for是不是你在R的朋友,這裏有一個解決方案使用apply和條件索引

## create some toy data  
matrix(ncol=5, nrow = 100, c(runif(300), sample(c('A','G','C','T','a','c','g','t'), replace=T, 200))) -> data 

flip_allele_freqs <- function(x) { 
## function will return 1-x on any x that looks like a number less than 1 
    n = as.numeric(x) 
    if (is.na(n)) { ## cant convert to numeric, must be str 
     return(x) 
    } 
    if (n < 1) { 
     return(1 - n) 
    } else { 
     return x 
    } 
} 

## apply the flip alleles function to the rows where the two last columns are equal 
##fold the new data back into the old matrix 

data[toupper(data[,5]) == toupper(data[,4]),] <- 
    apply(data[toupper(data[,5]) == toupper(data[,4]),], c(1,2), flip_allele_freqs) 

與GWAS祝你好運!

+0

我很確定這是否。謝謝! –

+1

如果你使用'ifelse'而不是'if'和'else',那麼沒有必要使用'apply'(這隻會讓這個變慢)。 – Roland

2

首先你的數據:

DF <- structure(list(ID = c("rs187078949", "rs191522553", "rs149483862", 
"rs150919307", "rs186644623", "rs193294418", "rs185496709", "rs188771313", 
"rs192945962", "rs184397180"), pos = c(10133L, 10140L, 10286L, 
10297L, 10315L, 10345L, 10386L, 10419L, 10425L, 10431L), ceuChr2SnpFreq = c(0.070588235, 
0.005882353, 0.1, 0.147058824, 0, 0.017647059, 0.082352941, 0.229411765, 
0.1, 0.064705882), chsChr2SnpFreq = c(0, 0, 0.135, 0.07, 0, 0, 
0.02, 0.085, 0.02, 0.005), lwkChr2SnpFreq = c(0.030927835, 0.005154639, 
0.226804124, 0.113402062, 0, 0.036082474, 0.087628866, 0.056701031, 
0.015463918, 0.036082474), tsiChr2SnpFreq = c(0.035714286, 0.005102041, 
0.239795918, 0.168367347, 0, 0.030612245, 0.035714286, 0.147959184, 
0.091836735, 0.015306122), yriChr2SnpFreq = c(0.045454545, 0.005681818, 
0.170454545, 0.130681818, 0.005681818, 0.028409091, 0.113636364, 
0.090909091, 0.034090909, 0.045454545), ALT = c("A", "A", "A", 
"T", "G", "A", "T", "G", "G", "T"), AA = c("a", "C", "t", "t", 
"C", "G", "t", "G", "c", "a")), .Names = c("ID", "pos", "ceuChr2SnpFreq", 
"chsChr2SnpFreq", "lwkChr2SnpFreq", "tsiChr2SnpFreq", "yriChr2SnpFreq", 
"ALT", "AA"), row.names = c("1", "2", "3", "4", "5", "6", "7", 
"8", "9", "10"), class = "data.frame") 

而且現在data.table解決方案:

#use data.table for excellent efficiency 
library(data.table) 
DT <- data.table(DF) 

#subtract 1 from columns 3 to 7 if columns ALT and AA are equal (case insensitive) 
DT[tolower(ALT)==tolower(AA), 3:7 := lapply(.SD, `-`, e2 = 1), .SDcols=3:7] 

#    ID pos ceuChr2SnpFreq chsChr2SnpFreq lwkChr2SnpFreq tsiChr2SnpFreq yriChr2SnpFreq ALT AA 
# 1: rs187078949 10133 -0.929411765   -1.000 -0.969072165 -0.964285714 -0.954545455 A a 
# 2: rs191522553 10140 0.005882353   0.000 0.005154639 0.005102041 0.005681818 A C 
# 3: rs149483862 10286 0.100000000   0.135 0.226804124 0.239795918 0.170454545 A t 
# 4: rs150919307 10297 -0.852941176   -0.930 -0.886597938 -0.831632653 -0.869318182 T t 
# 5: rs186644623 10315 0.000000000   0.000 0.000000000 0.000000000 0.005681818 G C 
# 6: rs193294418 10345 0.017647059   0.000 0.036082474 0.030612245 0.028409091 A G 
# 7: rs185496709 10386 -0.917647059   -0.980 -0.912371134 -0.964285714 -0.886363636 T t 
# 8: rs188771313 10419 -0.770588235   -0.915 -0.943298969 -0.852040816 -0.909090909 G G 
# 9: rs192945962 10425 0.100000000   0.020 0.015463918 0.091836735 0.034090909 G c 
# 10: rs184397180 10431 0.064705882   0.005 0.036082474 0.015306122 0.045454545 T a 
+0

我將不得不查找:= –

+0

只需閱讀[data.table intro](http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf )和[FAQ](http://cran.r-project.org/web/packages/data.table/vignettes/datatable-faq.pdf)。 – Roland