2014-07-07 97 views
0

我想變換使用一些映射函數的給定列的值。例如:映射列的值

df <- data.frame(A = 1:5, B = sample(1:20, 10)) 
df 
    A B 
1 1 17 
2 2 5 
3 3 3 
4 4 11 
5 5 19 
6 1 16 
7 2 4 
8 3 7 
9 4 6 
10 5 9 

我的目標是列A的所有元素映射如下:

1 -> "tt" 
2 -> "ff" 
3 -> "ss" 
4 -> "fs" 
5 -> "sf" 

我寫了下面的:

mappingList <- c("tt", "ff", "ss", "fs", "sf") 
df$A <- unlist(lapply(df$A, function(x){replace(x, x>0, mappingList[x])})) 
df 
    A B 
1 tt 17 
2 ff 5 
3 ss 3 
4 fs 11 
5 sf 19 
6 tt 16 
7 ff 4 
8 ss 7 
9 fs 6 
10 sf 9 

代碼爲上述工作的罰款。

現在讓我們假設另一個數據幀,其中A列不言整數1,2,3,4,5,而是任何其他「通用」的項目,說:

df <- data.frame(A = paste("str",1:5,sep=""), B = sample(1:20, 10)) 

df <- data.frame(A = seq(5, 25, by=5), B = sample(1:20, 10)) 

問題:你會如何編寫映射?

回答

2

你看factor

df$A_2 <- factor(df$A, levels = 1:5, labels = c("tt", "ff", "ss", "fs", "sf")) 
df 
# A B A_2 
# 1 1 17 tt 
# 2 2 5 ff 
# 3 3 3 ss 
# 4 4 11 fs 
# 5 5 19 sf 
# 6 1 16 tt 
# 7 2 4 ff 
# 8 3 7 ss 
# 9 4 6 fs 
# 10 5 9 sf 

基本上,你的levels參數應具有的原始值相匹配,並且您labels的說法應該有替換值。


您還可以創建一個帶有命名向量的查找表。

實施例:

df <- data.frame(A = paste("str",1:5,sep=""), B = sample(1:20, 10)) 

NamedVec <- setNames(paste("str",1:5,sep=""), c("tt", "ff", "ss", "fs", "sf")) 
NamedVec 
#  tt  ff  ss  fs  sf 
# "str1" "str2" "str3" "str4" "str5" 
NamedVec[df$A] 
#  tt  ff  ss  fs  sf  tt  ff  ss  fs  sf 
# "str1" "str2" "str3" "str4" "str5" "str1" "str2" "str3" "str4" "str5" 
names(NamedVec[df$A]) 
# [1] "tt" "ff" "ss" "fs" "sf" "tt" "ff" "ss" "fs" "sf" 
+0

具體地說'DF $ A < - 因子(DF $ A,水平= C (「str1」,「str2」,「str3」,「str4」,「str5」),labels = c(「tt」,「ff」,「ss」,「fs」,「sf」)) df $ A < - 因子(df $ A,levels = c(5,10,15,20,25),labels = c(「tt」,「ff」,「ss」,「fs」,「sf」) )' – MrFlick

+0

太棒了,我喜歡使用命名向量查找表的想法。 Thx很多Ananda! – Riad

+0

@Riad,SO用戶的相關閱讀[@PaulHiemstra](http://stackoverflow.com/users/1033808/paul-hiemstra):http://www.numbertheory.nl/2014/01/25/vectorisation-is - 您-最好的朋友-替換一對多元素-IN-A-字符向量/ – A5C1D2H2I1M1N2O1R2T1

0

嘗試:

mappingList[df$A] 
#[1] "tt" "ff" "ss" "fs" "sf" "tt" "ff" "ss" "fs" "sf" 

對於兩個其他數據集:

df1 <- data.frame(A = paste("str",1:5,sep=""), B = sample(1:20, 10)) 
df2 <- data.frame(A = seq(5, 25, by=5), B = sample(1:20, 10)) 

mappingList[as.numeric(df1$A)] 
#[1] "tt" "ff" "ss" "fs" "sf" "tt" "ff" "ss" "fs" "sf" 

mappingList[as.numeric(factor(df2$A))] 
#[1] "tt" "ff" "ss" "fs" "sf" "tt" "ff" "ss" "fs" "sf"