2015-05-10 491 views
-1

我已經在文本文件中使用pairwiseAlignment函數進行局部對齊了一個評分矩陣。然後我用這個功能,將其輸入到R:如何將矩陣轉換爲數字矩陣?

ex <- as.matrix(read.table("~/scoringMatrix", header=FALSE, sep = "\t", row.names = 1, as.is=TRUE) 

格式是這樣的:

> ex 

    A T C G 
A 5 -2 -1 -2 
T -2 7 -1 -2 
C -1 -1 7 2 
G -2 -2 2 8 

現在,每當我用pairwiseAlignment功能我得到以下錯誤:如果我

pairwiseAlignment(x[[1]], x[[2]], substitutionMatrix = ex, gapOpening = -2, gapExtension = -8, scoreOnly = FALSE) 
    Error in XStringSet.pairwiseAlignment(pattern = pattern, subject = subject, : 
     'substitutionMatrix' must be a numeric matrix 

會使用已經存在的替代矩陣,如BLOSUM50,它可以很好地工作。那麼如何讓這個矩陣適合pairwiseAlignment?

> dput(ex) 
structure(logical(0), .Dim = c(5L, 0L), .Dimnames = list(c(" A T C G", 
"A 5 -2 -1 -2", "T -2 7 -1 -2", "C -1 -1 7 2", "G -2 -2 2 8" 
), NULL)) 

雖然dput(BLOSUM50)看起來完全不同:

> dput(BLOSUM50) 
structure(c(5L, -2L, -1L, -2L, -1L, -1L, -1L, 0L, -2L, -1L, -2L, 
-1L, -1L, -3L, -1L, 1L, 0L, -3L, -2L, 0L, -2L, -1L, -1L, -5L, 
-2L, 7L, -1L, -2L, -4L, 1L, 0L, -3L, 0L, -4L, -3L, 3L, -2L, -3L, 
-3L, -1L, -1L, -3L, -1L, -3L, -1L, 0L, -1L, -5L, -1L, -1L, 7L, 
2L, -2L, 0L, 0L, 0L, 1L, -3L, -4L, 0L, -2L, -4L, -2L, 1L, 0L, 
-4L, -2L, -3L, 4L, 0L, -1L, -5L, -2L, -2L, 2L, 8L, -4L, 0L, 2L, 
-1L, -1L, -4L, -4L, -1L, -4L, -5L, -1L, 0L, -1L, -5L, -3L, -4L, 
5L, 1L, -1L, -5L, -1L, -4L, -2L, -4L, 13L, -3L, -3L, -3L, -3L, 
-2L, -2L, -3L, -2L, -2L, -4L, -1L, -1L, -5L, -3L, -1L, -3L, -3L, 
-2L, -5L, -1L, 1L, 0L, 0L, -3L, 7L, 2L, -2L, 1L, -3L, -2L, 2L, 
0L, -4L, -1L, 0L, -1L, -1L, -1L, -3L, 0L, 4L, -1L, -5L, -1L, 
0L, 0L, 2L, -3L, 2L, 6L, -3L, 0L, -4L, -3L, 1L, -2L, -3L, -1L, 
-1L, -1L, -3L, -2L, -3L, 1L, 5L, -1L, -5L, 0L, -3L, 0L, -1L, 
-3L, -2L, -3L, 8L, -2L, -4L, -4L, -2L, -3L, -4L, -2L, 0L, -2L, 
-3L, -3L, -4L, -1L, -2L, -2L, -5L, -2L, 0L, 1L, -1L, -3L, 1L, 
0L, -2L, 10L, -4L, -3L, 0L, -1L, -1L, -2L, -1L, -2L, -3L, 2L, 
-4L, 0L, 0L, -1L, -5L, -1L, -4L, -3L, -4L, -2L, -3L, -4L, -4L, 
-4L, 5L, 2L, -3L, 2L, 0L, -3L, -3L, -1L, -3L, -1L, 4L, -4L, -3L, 
-1L, -5L, -2L, -3L, -4L, -4L, -2L, -2L, -3L, -4L, -3L, 2L, 5L, 
-3L, 3L, 1L, -4L, -3L, -1L, -2L, -1L, 1L, -4L, -3L, -1L, -5L, 
-1L, 3L, 0L, -1L, -3L, 2L, 1L, -2L, 0L, -3L, -3L, 6L, -2L, -4L, 
-1L, 0L, -1L, -3L, -2L, -3L, 0L, 1L, -1L, -5L, -1L, -2L, -2L, 
-4L, -2L, 0L, -2L, -3L, -1L, 2L, 3L, -2L, 7L, 0L, -3L, -2L, -1L, 
-1L, 0L, 1L, -3L, -1L, -1L, -5L, -3L, -3L, -4L, -5L, -2L, -4L, 
-3L, -4L, -1L, 0L, 1L, -4L, 0L, 8L, -4L, -3L, -2L, 1L, 4L, -1L, 
-4L, -4L, -2L, -5L, -1L, -3L, -2L, -1L, -4L, -1L, -1L, -2L, -2L, 
-3L, -4L, -1L, -3L, -4L, 10L, -1L, -1L, -4L, -3L, -3L, -2L, -1L, 
-2L, -5L, 1L, -1L, 1L, 0L, -1L, 0L, -1L, 0L, -1L, -3L, -3L, 0L, 
-2L, -3L, -1L, 5L, 2L, -4L, -2L, -2L, 0L, 0L, -1L, -5L, 0L, -1L, 
0L, -1L, -1L, -1L, -1L, -2L, -2L, -1L, -1L, -1L, -1L, -2L, -1L, 
2L, 5L, -3L, -2L, 0L, 0L, -1L, 0L, -5L, -3L, -3L, -4L, -5L, -5L, 
-1L, -3L, -3L, -3L, -3L, -2L, -3L, -1L, 1L, -4L, -4L, -3L, 15L, 
2L, -3L, -5L, -2L, -3L, -5L, -2L, -1L, -2L, -3L, -3L, -1L, -2L, 
-3L, 2L, -1L, -1L, -2L, 0L, 4L, -3L, -2L, -2L, 2L, 8L, -1L, -3L, 
-2L, -1L, -5L, 0L, -3L, -3L, -4L, -1L, -3L, -3L, -4L, -4L, 4L, 
1L, -3L, 1L, -1L, -3L, -2L, 0L, -3L, -1L, 5L, -4L, -3L, -1L, 
-5L, -2L, -1L, 4L, 5L, -3L, 0L, 1L, -1L, 0L, -4L, -4L, 0L, -3L, 
-4L, -2L, 0L, 0L, -5L, -3L, -4L, 5L, 2L, -1L, -5L, -1L, 0L, 0L, 
1L, -3L, 4L, 5L, -2L, 0L, -3L, -3L, 1L, -1L, -4L, -1L, 0L, -1L, 
-2L, -2L, -3L, 2L, 5L, -1L, -5L, -1L, -1L, -1L, -1L, -2L, -1L, 
-1L, -2L, -1L, -1L, -1L, -1L, -1L, -2L, -2L, -1L, 0L, -3L, -1L, 
-1L, -1L, -1L, -1L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, 
-5L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, -5L, 
-5L, -5L, 1L), .Dim = c(24L, 24L), .Dimnames = list(c("A", "R", 
"N", "D", "C", "Q", "E", "G", "H", "I", "L", "K", "M", "F", "P", 
"S", "T", "W", "Y", "V", "B", "Z", "X", "*"), c("A", "R", "N", 
"D", "C", "Q", "E", "G", "H", "I", "L", "K", "M", "F", "P", "S", 
"T", "W", "Y", "V", "B", "Z", "X", "*"))) 
+0

'apply(x,2,as.numeric)'工作嗎? –

+0

@TadDallas是不是指'ex2 < - apply(ex,2,as.numeric)'?如果這就是你的意思,這是行不通的。 – estranged

+0

你確定要'header = FALSE',因爲你看起來有一個頭部'A T C G'。如果是這樣,使用'header = FALSE'將使矩陣的第一行成爲這些標籤,因此您的矩陣將是'character' – user20650

回答

2

它看起來像你的'scoringMatrix文件有空間分隔列,並且它的輸入只是

ex = as.matrix(read.delim("scoringMatrix", sep="")) 

其中有結構

> dput(ex) 
structure(c(5L, -2L, -1L, -2L, -2L, 7L, -1L, -2L, -1L, -1L, 7L, 
2L, -2L, -2L, 2L, 8L), .Dim = c(4L, 4L), .Dimnames = list(c("A", 
"T", "C", "G"), c("A", "T", "C", "G"))) 

在您的輸入中,沒有製表符\t,因此每行只讀入一列。而row.names=1意味着該單塔被指定爲行名 - 所以你有5行和零列

> read.table("scoringMatrix", sep="\t", header=FALSE, row.names=1) 
data frame with 0 columns and 5 rows 

在一個5×0矩陣強制轉換這一個矩陣的結果,你在看你的原始顯示是矩陣的行名稱(!)。

這將是R「由手」被創建,由@DavidArenburg如建議用

matrix(c(5, -2, -1, -2, 
     -2, 7, -1, -2, 
     -1, -1, 7, 2, 
     -2, -2, 2, 8), 
     nrow=4, ncol=4, 
     dimnames=list(
     c("A", "C", "G", "T"), 
     c("A", "C", "G", "T")), 
     byrow=TRUE) 
+0

這工作!謝謝你的好解釋 – estranged

+1

OP實際上使用文本編輯器手工創建了這個矩陣。另一種選擇是使用R創建一個類似的矩陣。像'head(BLOSUM50 [,1:4],4)' –

+0

@Martin,哦,這是關於如何手動創建它的很好的信息。 – estranged

2

另一種選擇是隻選擇所需的列/使用match和避免產生這種文件從BLOSUM50行手動使用一個文本編輯器

indx <- match(c("A", "T", "C", "G"), rownames(BLOSUM50)) 
BLOSUM50[indx, indx] 
# A T C G 
# A 5 0 -1 0 
# T 0 5 -1 -2 
# C -1 -1 13 -3 
# G 0 -2 -3 8