2015-08-24 41 views
1

我有一個data.table,它包含幾列factor s。我想將最初讀爲factor的2列轉換爲其原始數值。以下是我已經試過:如何通過引用來變換按位置索引的數據表列?

data[, c(4,5):=c(as.numeric(as.character(4)), as.numeric(as.character(5))), with=FALSE] 

這給了我以下警告:

Warning messages: 
1: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), : 
    Supplied 2 items to be assigned to 7 items of column 'Bentley (R)' (recycled leaving remainder of 1 items). 
2: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), : 
    Supplied 2 items to be assigned to 7 items of column 'Sparks (D)' (recycled leaving remainder of 1 items). 
3: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), : 
    Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first. 
4: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), : 
    Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first. 

而且我可以告訴轉換在這之後是factor s已沒有成功,因爲第4和第5列堅持代碼已經運行。

作爲替代,我想這個代碼,這將不會運行在所有:

data[, ':=' (4=c(as.numeric(as.character(4)), 5 = as.numeric(as.character(5)))), with=FALSE] 

最後,我試圖通過colnames引用的列名:

data[ , (colnames(data)[4]) := as.numeric(as.character(colnames(data)[4]))] 

此運行,但結果連續出現NA s以及以下錯誤:

Warning messages: 
1: In eval(expr, envir, enclos) : NAs introduced by coercion 
2: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) : 
    Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first. 
3: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) : 
    RHS contains -2147483648 which is outside the levels range ([1,6]) of column 1, NAs generated 

我需要按位置而不是按列名進行此操作,因爲列名取決於URL。使用data.table按位置轉換列的正確方法是什麼?

我也有一個相關的查詢,它是如何相對於其他編號列轉換編號列。例如,如果我想將第三列設置爲等於45減去第三列的值加上第四列的值,我該怎麼做?有什麼方法可以區分真正的#號和列號嗎?我知道這樣的事情不是要走的路:

dt[ , .(4) = 45 - .(3) + .(4), with = FALSE] 

那麼這怎麼辦呢?

+1

按位置進行索引顯然是不好的做法,但請繼續:'dt [,4] < - 45 - dt [[3]] + dt [[4]]' – Frank

+0

請不要*使用'<-'用於添加/更新'data.table'中的列。使用':='。這是慣用的方式。你看過[新的HTML小插曲](https://github.com/Rdatatable/data.table/wiki/Getting-started)嗎? – Arun

回答

5

如果要按引用和位置進行分配,則需要將列名稱分配爲字符向量或列號作爲整數向量,並使用.SDcols(至少在data.table 1.9.4中) 。

首先重複的例子:

library(data.table) 
DT <- data.table(iris) 
DT[, c("Sepal.Length", "Petal.Length") := list(factor(Sepal.Length), factor(Petal.Length))] 
str(DT) 

現在讓我們轉換列:

DT[, names(DT)[c(1, 3)] := lapply(.SD, function(x) as.numeric(as.character(x))), 
    .SDcols = c(1, 3)] 
str(DT) 

或者:

DT[, c(1,3) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcols=c(1,3)] 
str(DT) 

注意:=預計列名或位置的矢量左側和右側的列表。

+1

@Frank感謝您的建議。 – Roland

相關問題