編輯:看來這個問題發生了巨大變化,而我回答,但無論如何,這裏留下這個。
我假設@ joran的評論是正確的,你的意思是(和固定在lev
順序)
lev <- c("alpha", "bravo", "charlie", "delta", "echo", "foxtrot")
A <- factor(sample(lev, 6000, TRUE), levels=lev)
B <- factor(sample(lev, 6000, TRUE), levels=lev)
而且,mapping
不是一個二維數組(矩陣),也不是一個嵌套的數據結構(名單列表)就像你似乎認爲
> mapping
alpha.alpha alpha.bravo alpha.charlie alpha.delta alpha.echo
"green" "blue" "blue" "red" "red"
alpha.foxtrot bravo.alpha bravo.bravo bravo.charlie bravo.delta
"red" "blue" "green" "blue" "red"
bravo.echo bravo.foxtrot charlie.alpha charlie.bravo charlie.charlie
"red" "red" "blue" "blue" "green"
charlie.delta charlie.echo charlie.foxtrot delta.alpha delta.bravo
"red" "red" "red" "red" "red"
delta.charlie delta.delta delta.echo delta.foxtrot echo.alpha
"red" "green" "yellow" "red" "red"
echo.bravo echo.charlie echo.delta echo.echo echo.foxtrot
"red" "red" "yellow" "red" "red"
foxtrot.alpha foxtrot.bravo foxtrot.charlie foxtrot.delta foxtrot.echo
"red" "red" "red" "red" "red"
foxtrot.foxtrot
"green"
現在,如果你想保存這是一個列表的列表:
mapping <- list(
"alpha" = list("alpha"="green", "bravo"="blue", "charlie"="blue",
"delta"="red", "echo"="red", "foxtrot"="red"),
"bravo" = list("alpha"="blue", "bravo"="green", "charlie"="blue",
"delta"="red", "echo"="red", "foxtrot"="red"),
"charlie" = list("alpha"="blue", "bravo"="blue", "charlie"="green",
"delta"="red", "echo"="red", "foxtrot"="red"),
"delta" = list("alpha"="red", "bravo"="red", "charlie"="red",
"delta"="green", "echo"="yellow", "foxtrot"="red"),
"echo" = list("alpha"="red", "bravo"="red", "charlie"="red",
"delta"="yellow", "echo"="red", "foxtrot"="red"),
"foxtrot" = list("alpha"="red", "bravo"="red", "charlie"="red",
"delta"="red", "echo"="red", "foxtrot"="green")
)
mapper = function(X, Y) mapping[[levels(X)[X]]][[levels(Y)[Y]]]
請注意,我在創造mapping
和mapper
使用list
代替c
使用提取([[
)未子集化([
)符號。
檢查這個工程的一個值:
> mapper(A[1], B[1])
[1] "red"
而對於只有幾個值:
> mapper(A[1:2], B[1:2])
Error in mapping[[levels(X)[X]]][[levels(Y)[Y]]] :
attempt to select more than one element
所以我們看到mapper
沒有矢量(因爲它必須是)。從outer
的幫助頁面:
FUN
以這兩個擴展向量爲參數被調用。因此,它必須是一個矢量化函數(或一個名稱),期望至少有兩個參數。
最簡單的,但不一定是有效的,方式向量化它:
> Vectorize(mapper)(A[1:2], B[1:2])
[1] "red" "green"
這現在工作在一個子集:
> outer(A[1:6], B[1:6], FUN=Vectorize(mapper))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "red" "yellow" "red" "red" "red" "red"
[2,] "red" "green" "red" "red" "red" "yellow"
[3,] "red" "green" "red" "red" "red" "yellow"
[4,] "blue" "red" "blue" "red" "blue" "red"
[5,] "green" "red" "green" "red" "green" "red"
[6,] "red" "red" "red" "green" "red" "red"
讓我們來看看時序:
> system.time(outer(A[1:6], B[1:6], FUN=Vectorize(mapper)))
user system elapsed
0 0 0
> system.time(outer(A[1:60], B[1:60], FUN=Vectorize(mapper)))
user system elapsed
0.22 0.00 0.22
> system.time(outer(A[1:600], B[1:600], FUN=Vectorize(mapper)))
user system elapsed
23.97 0.01 24.01
查看外部產品長度的線性,或q在A或B的長度上是不成熟的。我沒有等待40分鐘,看看6000x6000是否可以工作。
我們可以讓這個效率更高嗎?雙遞歸到遞歸結構(然後在頂部使用Vectorize
)並不是那麼有效。讓我們使用不同的數據結構:一個二維數組(矩陣)並使用基於矩陣的索引。
mapping <- matrix(c("green", "blue", "blue", "red", "red", "red",
"blue", "green", "blue", "red", "red", "red",
"blue", "blue", "green", "red", "red", "red",
"red", "red", "red", "green", "yellow", "red",
"red", "red", "red", "yellow", "red", "red",
"red", "red", "red", "red", "red", "green"),
nrow = 6, ncol = 6,
dimnames = list(lev, lev))
mapper <- function(X, Y) mapping[cbind(as.character(X), as.character(Y))]
和測試這個
> A[1:6]
[1] echo delta delta charlie alpha foxtrot
Levels: alpha bravo charlie echo delta foxtrot
> B[1:6]
[1] alpha delta alpha foxtrot alpha echo
Levels: alpha bravo charlie echo delta foxtrot
> mapper(A[1], B[1])
[1] "red"
> mapper(A[1:2], B[1:2])
[1] "red" "green"
> outer(A[1:6], B[1:6], FUN=mapper)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "red" "yellow" "red" "red" "red" "red"
[2,] "red" "green" "red" "red" "red" "yellow"
[3,] "red" "green" "red" "red" "red" "yellow"
[4,] "blue" "red" "blue" "red" "blue" "red"
[5,] "green" "red" "green" "red" "green" "red"
[6,] "red" "red" "red" "green" "red" "red"
看起來不錯。檢查時間:
> system.time(outer(A[1:6], B[1:6], FUN=mapper))
user system elapsed
0 0 0
> system.time(outer(A[1:60], B[1:60], FUN=mapper))
user system elapsed
0 0 0
> system.time(outer(A[1:600], B[1:600], FUN=mapper))
user system elapsed
0.22 0.00 0.22
> system.time(outer(A, B, FUN=mapper))
user system elapsed
7.80 1.48 9.30
稍微超過9秒而不是約40分鐘約250倍的加速。
小問題:你的意思是'sample(lev,6000,TRUE)'?否則,你只是得到'NA',我想...... – joran
也可以給你一些期望的輸出,以確保我們得到正確的答案? – MrFlick
考慮到這一點,我的第一反應是想知道你是否必須有(三角形)矩陣輸出。這感覺更像是一個加入(或合併)給我。 – joran