2014-06-21 67 views
2

我正在用字/字符串替換數據框中的所有數字。每個號碼將被替換爲完全相同的單詞。例如數字5的所有實例都應該用'香蕉'來代替,「奇異果」用10來代替所有的實例,等等。用字符串替換數據框中的所有數字實例R

以下是一個示例數據框。 Rownames和colnames是數字太:

# 1 2 3 4 5 6 
#1 7 7 7 7 7 7 
#2 5 5 5 5 5 5 
#3 4 4 4 4 4 4 
#4 8 8 8 8 8 8 
#5 1 1 1 1 1 1 
#6 2 2 2 2 2 2 
#7 6 6 6 6 3 3 
#8 3 3 3 3 6 6 
#9 10 10 10 10 10 10 
#10 11 11 11 11 11 11 
#11 12 12 12 12 12 12 
#12 9 9 9 9 9 9 

下面是用於再現這個樣品數據(是myDF):

mydf<-structure(c(7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 
1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 
9, 7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 3, 
6, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 3, 6, 10, 11, 12, 9), .Dim = c(12L, 
6L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12"), c("1", "2", "3", "4", "5", "6"))) 

這裏是一個數據幀(MYDATA)我構造顯示哪些數應與被替換字/水果:

mydata <- data.frame(nums = c(1:12))      
mydata$fruits<-c("apple", "pear", "orange", "melon", "banana", "grape", "pineapple",  "mango", "lemon", "kiwi", "guava", "peach") 

我試圖尋找通過類似命名的主題,但他們主要討論改變dataframes的某些部分(如特定的變量或具體觀察s),而不是整個數據幀的內容。

我試過使用多個gsub命令,但這不起作用的原因有很多。我想我需要使用一個函數來應用df中的所有變量,但不知道是什麼。

最終的結果應該是這個樣子:

 1   2   3   4   5   6   
1 "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" 
2 "banana" "banana" "banana" "banana" "banana" "banana" 
3 "melon"  "melon"  "melon"  "melon"  "melon"  "melon"  
4 "mango"  "mango"  "mango"  "mango"  "mango"  "mango"  
5 "apple"  "apple"  "apple"  "apple"  "apple"  "apple"  
6 "pear"  "pear"  "pear"  "pear"  "pear"  "pear"  
7 "grape"  "grape"  "grape"  "grape"  "orange" "orange" 
8 "orange" "orange" "orange" "orange" "grape"  "grape"  
9 "kiwi"  "kiwi"  "kiwi"  "kiwi"  "kiwi"  "kiwi"  
10 "guava"  "guava"  "guava"  "guava"  "guava"  "guava"  
11 "peach"  "peach"  "peach"  "peach"  "peach"  "peach"  
12 "lemon"  "lemon"  "lemon"  "lemon"  "lemon"  "lemon" 

雖然理想,引號將不可見(我不知道這是可能的,雖然)。

回答

4

你可以用match來做到這一點,match指向一個查找向量(你的mydata),返回另一個向量中每個元素向量的位置。

mydf[] <- mydata$fruits[match(mydf, mydata$nums)] 

如果強迫一個data.frame,報價是不可見的,當你打印對象屏幕:

as.data.frame(mydf) 

#   1   2   3   4   5   6 
# 1 pineapple pineapple pineapple pineapple pineapple pineapple 
# 2  banana banana banana banana banana banana 
# 3  melon  melon  melon  melon  melon  melon 
# 4  mango  mango  mango  mango  mango  mango 
# 5  apple  apple  apple  apple  apple  apple 
# 6  pear  pear  pear  pear  pear  pear 
# 7  grape  grape  grape  grape orange orange 
# 8  orange orange orange orange  grape  grape 
# 9  kiwi  kiwi  kiwi  kiwi  kiwi  kiwi 
# 10  guava  guava  guava  guava  guava  guava 
# 11  peach  peach  peach  peach  peach  peach 
# 12  lemon  lemon  lemon  lemon  lemon  lemon  

無論你是否強迫到data.frame,您可以提供quote=FALSEwrite.tablewrite.csv以防止引號出現在導出文件中的字符串周圍。

+2

由於查找數據已經排序,所以'mydf [] < - mydata $ fruits [mydf]'將會起作用。 – thelatemail

+0

@thelatemail:是的,baptiste提到過。我給出了一個普遍適用的解決方案,因爲我不想假設OP的真實問題和例子一樣簡單。 – jbaums

+1

謝謝。這很好 - 實際上所有的建議。如何以多種方式解決這樣的問題是很好的。也感謝一般的解決方案。顯然,我的真實數據問題比簡單的例子複雜得多......而且與水果無關! – jalapic

0

replace可能適合你。

> replace(mydf, seq_along(mydf), mydata[[2]][mydf]) 
# 1   2   3   4   5   6   
# 1 "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" 
# 2 "banana" "banana" "banana" "banana" "banana" "banana" 
# 3 "melon"  "melon"  "melon"  "melon"  "melon"  "melon"  
# 4 "mango"  "mango"  "mango"  "mango"  "mango"  "mango"  
# 5 "apple"  "apple"  "apple"  "apple"  "apple"  "apple"  
# 6 "pear"  "pear"  "pear"  "pear"  "pear"  "pear"  
# 7 "grape"  "grape"  "grape"  "grape"  "orange" "orange" 
# 8 "orange" "orange" "orange" "orange" "grape"  "grape"  
# 9 "kiwi"  "kiwi"  "kiwi"  "kiwi"  "kiwi"  "kiwi"  
# 10 "guava"  "guava"  "guava"  "guava"  "guava"  "guava"  
# 11 "peach"  "peach"  "peach"  "peach"  "peach"  "peach"  
# 12 "lemon"  "lemon"  "lemon"  "lemon"  "lemon"  "lemon" 

它可以用as.data.frame打包,以刪除必要的引號。

0

由於水果是正確的順序,並通過1:12被編入索引,您可以使用mydf的條目索引mydata$fruits

apply(mydf, 2, function(x) mydata$fruits[x]) 

如果值是不正確的順序,或者不涵蓋所有可能的值(有「洞」),你可以使用一個因素來翻譯:

apply(mydf, 2, function(x) factor(x, levels=mydata$nums, labels=mydata$fruits)) 
0

另一種可能的方法:

library(qdapTools) 
as.data.frame(apply(mydf, 2, lookup, mydata)) 

##   1   2   3   4   5   6 
## 1 pineapple pineapple pineapple pineapple pineapple pineapple 
## 2  banana banana banana banana banana banana 
## 3  melon  melon  melon  melon  melon  melon 
## 4  mango  mango  mango  mango  mango  mango 
## 5  apple  apple  apple  apple  apple  apple 
## 6  pear  pear  pear  pear  pear  pear 
## 7  grape  grape  grape  grape orange orange 
## 8  orange orange orange orange  grape  grape 
## 9  kiwi  kiwi  kiwi  kiwi  kiwi  kiwi 
## 10  guava  guava  guava  guava  guava  guava 
## 11  peach  peach  peach  peach  peach  peach 
## 12  lemon  lemon  lemon  lemon  lemon  lemon 
相關問題