2012-06-29 42 views
4

R通常以「錯誤」格式理解數據幀列,或者您只需要將因子列中的列類更改爲字符以便對其進行修改。我一直在改變柱類以下方式以前:用於轉換數據幀列類型的函數

set.seed(1) 

df <- data.frame(x = 1:10, 
y = rep(1:2, 5), 
k = rnorm(10, 5,2), 
z = rep(c(2010, 2012, 2011, 2010, 1999), 2), 
j = c(rep(c("a", "b", "c"), 3), "d")) 

x <- c("y", "z") 

for(i in 1:length(x)){ 
df[,x[i]] <- factor(df[,x[i]])} 

再換數字:

x <- 1:5 

for(i in 1:length(x)){ 
df[,x[i]] <- as.numeric(as.character(df[,x[i]]))} # Character cannot become numeric 

它發生,我認爲也許有一個更好的方式這樣做。我發現this question,這幾乎正是我需要的:

convert.magic <- function(obj,types){ 
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- 
switch(types[i], 
character = as.character, 
numeric = as.numeric, 
factor = as.factor); FUN1(obj[,i])}) 
names(out) <- colnames(obj) 
as.data.frame(out) 
} 

然而,對於這個功能載體類型都有每一列中指定:

convert.magic(df, rep("factor",5)) 

convert.magic(df, c("character", "factor")) 
# Error in FUN(1:5[[1L]], ...) : could not find function "FUN1" 

有人能幫助我重建這個功能,所以它可以與列名和數字一起工作嗎?恐怕這將是對我來說太先進...

x <- c("y", "z") 
convert.magic(df, "character", x) 
+3

如果你只轉換因子數值,從'factor':「要變換系數f約爲原來的數字值,as.numeric(levels(f))[f]被推薦並且比as.numeric(as.character(f))稍微更有效。「這也表明'convert.magic'在某些情況下可能會產生意想不到的結果。 – BenBarnes

+0

@BenBarnes嗯......我不知道。非常好的評論,謝謝! – Mikko

回答

6
df <- data.frame(x = 1:10, 
       y = rep(1:2, 5), 
       k = rnorm(10, 5,2), 
       z = rep(c(2010, 2012, 2011, 2010, 1999), 2), 
       j = c(rep(c("a", "b", "c"), 3), "d")) 

convert.magic <- function(obj, type){ 
    FUN1 <- switch(type, 
       character = as.character, 
       numeric = as.numeric, 
       factor = as.factor) 
    out <- lapply(obj, FUN1) 
    as.data.frame(out) 
} 

str(df) 
str(convert.magic(df, "character")) 
str(convert.magic(df, "factor")) 
df[, c("x", "y")] <- convert.magic(df[, c("x", "y")], "factor") 
+3

這將轉換整個data.frame。稍作修改就更接近我之後: 'convert.magic < - function(obj,type,cols)FUN1 < - switch(type, character = as.character, numeric = as.numeric, factor = as.factor) obj [,cols] < - lapply(obj [,cols],FUN1) as.data.frame(obj) }'如何添加BenBarnes評論('as.numeric(levels(f) )[f]')在這個函數中? – Mikko

+1

@Largh代替在'switch'語句中使用'as.numeric',你可能會寫一個簡單的包裝來檢查它的輸入是否是一個因子。如果是,使用Ben的方法,否則使用'as.numeric'。 – joran