我有一個data.frame
,看起來像這樣。將三列數據幀重塑爲矩陣(「長」到「寬」格式)
x a 1
x b 2
x c 3
y a 3
y b 3
y c 2
我想這個矩陣形式,所以我可以餵它到熱圖做一個情節。結果應該是這樣的:
a b c
x 1 2 3
y 3 3 2
我試圖從重塑包cast
和我試圖寫一個手動功能做到這一點,但我似乎並沒有能夠得到它的權利。
我有一個data.frame
,看起來像這樣。將三列數據幀重塑爲矩陣(「長」到「寬」格式)
x a 1
x b 2
x c 3
y a 3
y b 3
y c 2
我想這個矩陣形式,所以我可以餵它到熱圖做一個情節。結果應該是這樣的:
a b c
x 1 2 3
y 3 3 2
我試圖從重塑包cast
和我試圖寫一個手動功能做到這一點,但我似乎並沒有能夠得到它的權利。
有很多方法可以做到這一點。這個答案從我最喜歡的方式開始,但也收集各種方式從答案到散佈在這個網站周圍的類似問題。
tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
y=gl(3,1,6, labels=letters[1:3]),
z=c(1,2,3,3,3,2))
使用reshape2:
library(reshape2)
acast(tmp, x~y, value.var="z")
使用矩陣索引:
with(tmp, {
out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
dimnames=list(levels(x), levels(y)))
out[cbind(x, y)] <- z
out
})
使用xtabs
:
xtabs(z~x+y, data=tmp)
您還可以使用reshape
,作爲她的建議e:Convert table into matrix by column names,儘管之後你必須做一些小操作來刪除多餘的列並獲取正確的名稱(未顯示)。
> reshape(tmp, idvar="x", timevar="y", direction="wide")
x z.a z.b z.c
1 x 1 2 3
4 y 3 3 2
還有sparseMatrix
的Matrix
包內,如下所示:R - convert BIG table into matrix by column names
> with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
+ dimnames=list(levels(x), levels(y))))
2 x 3 sparse Matrix of class "dgCMatrix"
a b c
x 1 2 3
y 3 3 2
從plyr
庫daply
功能也可以使用,在這裏:https://stackoverflow.com/a/7020101/210673
> library(plyr)
> daply(tmp, .(x, y), function(x) x$z)
y
x a b c
x 1 2 3
y 3 3 2
來自reshape2的dcast
也適用,如下所示:Reshape data for values in one column,但是您會得到一個data.frame,其值爲x
。
> dcast(tmp, x~y, value.var="z")
x a b c
1 x 1 2 3
2 y 3 3 2
同樣,spread
從「tidyr」也將這樣的改造工作:
library(tidyr)
spread(tmp, y, z)
# x a b c
# 1 x 1 2 3
# 2 y 3 3 2
'acast(tmp,x〜y,value.var =「z」)'會給出一個矩陣輸出,其中'x'作爲行.names – mnel 2012-10-08 04:56:00
的問題是,一些歲,但也許有些人還是感興趣的備選答案。
如果你不想加載任何包,您可以使用此功能:
#' Converts three columns of a data.frame into a matrix -- e.g. to plot
#' the data via image() later on. Two of the columns form the row and
#' col dimensions of the matrix. The third column provides values for
#' the matrix.
#'
#' @param data data.frame: input data
#' @param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix
#' @param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix
#' @param datatitle string: name of the column in data, which values should be filled into the output matrix
#' @param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order?
#' @param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order?
#' @param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries
#' @return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle]
#' @author Daniel Neumann
#' @date 2017-08-29
data.frame2matrix = function(data, rowtitle, coltitle, datatitle,
rowdecreasing = FALSE, coldecreasing = FALSE,
default_value = NA) {
# check, whether titles exist as columns names in the data.frame data
if ((!(rowtitle%in%names(data)))
|| (!(coltitle%in%names(data)))
|| (!(datatitle%in%names(data)))) {
stop('data.frame2matrix: bad row-, col-, or datatitle.')
}
# get number of rows in data
ndata = dim(data)[1]
# extract rownames and colnames for the matrix from the data.frame
rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
nrows = length(rownames)
colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
ncols = length(colnames)
# initialize the matrix
out_matrix = matrix(NA,
nrow = nrows, ncol = ncols,
dimnames=list(rownames, colnames))
# iterate rows of data
for (i1 in 1:ndata) {
# get matrix-row and matrix-column indices for the current data-row
iR = which(rownames==data[[rowtitle]][i1])
iC = which(colnames==data[[coltitle]][i1])
# throw an error if the matrix entry (iR,iC) is already filled.
if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
out_matrix[iR, iC] = data[[datatitle]][i1]
}
# set empty matrix entries to the default value
out_matrix[is.na(out_matrix)] = default_value
# return matrix
return(out_matrix)
}
工作原理:
myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'),
'dim2'=c('a','b','c','a','b','c'),
'values'=c(1,2,3,3,3,2)))
myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values')
myMatrix
> a b c
> x 1 2 3
> y 3 3 2
@AnandaMahto也有關於這個偉大的答案在這裏:HTTP ://stackoverflow.com/a/14515736/210673 – Aaron 2013-01-25 05:42:43