與查找從雙矩陣

我有一個數據幀DF填充在數據幀的新列：與查找從雙矩陣

colour shape 
'red' circle 
'blue' square 
'blue' circle 
'green' sphere

並與名爲行的雙矩陣m /列

 circle square sphere 
red 1  4  7 
blue 2  5  8 
green 3  6  9

我會喜歡新列添加到DF這樣，我獲得：

id colour shape 
1 'red' circle 
5 'blue' square 
2 'blue' circle 
9 'green' sphere

我試着用下面的代碼這樣做，但它似乎不工作：

df$id <- m[df$colour,df$shape]

我也試過apply（）;和類似的，但沒有運氣。任何人都可以告訴我正確的做法，而不使用循環？

來源

2012-03-21 Ina

謝謝大家的幫助。 @Tommy的回答如下，而迪文的答案對此很有幫助。因爲我在我的真實數據中使用了字符向量，所以我和迪文一起去了我的場景。 – Ina 2012-03-22 14:26:54

我想我可能只要那些特徵向量，而不是，除非你做代表特定的努力，以避免可能更預期的因素在這裏贏得了簡短的回答較量。它只是增加cbind來將兩個df「字符」向量轉換爲[.matrix函數預期的兩列矩陣，這兩個矩陣的使用非常接近成功。（它也似乎有合理的表現力。）

# Data construct 
d <- data.frame(color=c('red','blue','blue','green'), 
shape=c('circle','square','circle','sphere'), stringsAsFactors=FALSE) 
m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere'))) 
# Code: 

d$id <- with(d, m [ cbind(color, shape) ]) 
d 
    color shape id 
1 red circle 1 
2 blue square 5 
3 blue circle 2 
4 green sphere 9

來源

2012-03-21 22:11:49

請注意，這隻適用於'd'中的級別與m中的rownames/colnames具有相同的順序。我試圖在我的回答中解釋這一點。用'm <-m [3：1，]'再試一遍，看看它失敗了...... – Tommy 2012-03-21 23:16:47

哦，對不起，沒有仔細閱讀：考慮到'd'包含字符矢量而不是實際工作的因素...我的解決方案在兩種情況下都可以使用;-) – Tommy 2012-03-21 23:23:09

也可以使用'm [cbind（as.character（d $ color），as.character（d $ shape）]'，我認爲這是一般的和更多的清除 – 2012-03-22 05:38:28

merge()是你的朋友在這裏。要使用它，我們需要一個合適的數據框來合併包含ID矩陣的堆疊版本。我創建瞭如newdf與下面的代碼：

df <- data.frame(matrix(1:9, ncol = 3)) 
colnames(df) <- c("circle","square","sphere") 
rownames(df) <- c("red","blue","green") 

newdf <- cbind.data.frame(ID = unlist(df), 
          expand.grid(colour = rownames(df), 
             shape = colnames(df)))

，這導致：使用

> newdf 
     ID colour shape 
circle1 1 red circle 
circle2 2 blue circle 
circle3 3 green circle 
square1 4 red square 
square2 5 blue square 
square3 6 green square 
sphere1 7 red sphere 
sphere2 8 blue sphere 
sphere3 9 green sphere

與原始數據

然後在對象df2，定義

df2 <- data.frame(colour = c("red","blue","blue","green"), 
        shape = c("circle","square","circle","sphere"))

使用merge()

> merge(newdf, df2, sort = FALSE) 
    colour shape ID 
1 red circle 1 
2 blue circle 2 
3 blue square 5 
4 green sphere 9

可以存儲，如果你需要的是重新排列列：（！和快速）

> res <- merge(newdf, df2, sort = FALSE) 
> res <- res[,c(3,1,2)] 
> res 
    ID colour shape 
1 1 red circle 
2 2 blue circle 
3 5 blue square 
4 9 green sphere

來源

2012-03-21 21:08:12

一個相當簡單的方法是用一個矩陣來索引你的矩陣：

# Your data 
d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere')) 
m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere'))) 

# Create index matrix - each row is a row/col index 
i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m))) 

# Now use it and add as the id column... 
d2 <- cbind(id=m[i], d) 

d2 
# id color shape 
#1 1 red circle 
#2 5 blue square 
#3 2 blue circle 
#4 9 green sphere

的match函數用於查找特定字符串的相應數字索引。

請注意，在較新版本的R（2.13和更新的我認爲）中，您可以在索引矩陣中使用字符串。不幸的是，顏色和形狀列通常factors，並cbind不喜歡（它使用整數代碼），所以你需要用as.character強迫他們：

i <- cbind(as.character(d$color), as.character(d$shape))

...我懷疑使用match但效率更高。

EDIT我測量並且它似乎是快大約20％使用match：

# Make 1 million rows 
d <- d[sample.int(nrow(d), 1e6, TRUE), ] 

system.time({ 
    i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m))) 
    d2 <- cbind(id=m[i], d) 
}) # 0.46 secs 


system.time({ 
    i <- cbind(as.character(d$color), as.character(d$shape)) 
    d2 <- cbind(id=m[i], d) 
}) # 0.55 secs

來源

2012-03-21 21:12:56 Tommy

只要@Tommy提起來，將m轉換爲向量的解決方案需要0.14秒，而我的機器上面的第一個示例爲0.50秒;） – BenBarnes 2012-03-22 09:49:50

我標記了@DWin的答案是正確的，因爲這是一個正確的答案我使用過（我更喜歡簡單並且沒有時間限制），但是這個答案也很有用，我非常欣賞它所付出的努力。謝謝！ – Ina 2012-03-22 14:14:24

您也可以在矩陣m轉換爲向量，然後匹配ID，以顏色和形狀值：

df<-data.frame(colour=c("red","blue","blue","green"), 
    shape=c("circle","square","circle","sphere")) 


m<-matrix(1:9,nrow=3,dimnames=list(c("red","blue","green"), 
    c("circle","square","sphere"))) 


mVec<-as.vector(m)

下一步將df中的顏色與m矩陣中適當的dimname相匹配，然後添加一個與形狀相對應的整數。帶有相應ID的m向量索引的結果。

df$ID<-mVec[match(df$colour, dimnames(m)[[1]]) + (dim(m)[1]* 
    (match(df$shape, dimnames(m)[[2]]) - 1))]

來源

2012-03-21 21:19:22 BenBarnes

+1最快！ – Tommy 2012-03-22 16:01:15

另一個答案使用reshape2和plyr（可選只是爲了參加）封裝。

require(plyr) 
require(reshape2) 

Df <- data.frame(colour = c("red", "blue", "blue", "green"), 
        shape = c("circle", "square", "circle", "sphere")) 

Mat <- matrix(1:9, dimnames = list(c("red", "blue", "green"), 
            c("circle", "square", "sphere")), 
        nrow = 3) 

Df2 <- melt.array(Mat, varnames = c("colour", "shape")) 

join(Df, Df2) 
result <- join(Df, Df2) 

join(Df, Df2) 
Joining by: colour, shape 
    colour shape value 
1 red circle  1 
2 blue square  5 
3 blue circle  2 
4 green sphere  9

希望這有助於

來源

2012-03-21 21:21:48 dickoa

#recreating your data 
dat <- read.table(text="colour shape 
'red' circle 
'blue' square 
'blue' circle 
'green' sphere", header=TRUE) 

d2 <- matrix(c(1:9), ncol=3, nrow=3, byrow=TRUE) 
dimnames(d2) <-list(c('circle', 'square', 'sphere'), 
c("red", "blue", "green")) 
d2<-as.table(d2) 

#make a list of matching to the row and column names of the look up matrix 
LIST <- list(match(dat[, 2], rownames(d2)), match(dat[, 1], colnames(d2))) 
#use sapply to index the lookup matrix using the row and col values from LIST 
id <- sapply(seq_along(LIST[[1]]), function(i) d2[LIST[[1]][i], LIST[[2]][i]]) 
#put it all back together 
data.frame(id=id, dat)

來源

2012-03-21 21:31:35

與查找從雙矩陣

回答

相關問題