在使用read.csv.ffdf
讀取大型數據集之後,其中一列是時間。如2014-10-18 00:01:02
,該列中有1百萬行。該欄是一個因素。如何將其轉換爲POSIXct
由ff
支持?簡單地使用as.POSIXct()
只是將值變成NA
如何將因子向量轉換爲ff或ffbase中的POSIXct
或者當我在開始讀取數據集時,我可以指定該列爲POSIXct
?
我的目標是獲得月和日(甚至小時)。所以我對除了轉換爲POSIXct
之外的解決方案開放。
例如,我們有9由2臺,
test <- read.csv.ffdf(file="test.csv", header=T, first.rows=-1)
兩列ID(數字類),和時間(因子類)
這裏是dput
structure(list(virtual = structure(list(VirtualVmode = c("integer",
"integer"), AsIs = c(FALSE, FALSE), VirtualIsMatrix = c(FALSE,
FALSE), PhysicalIsMatrix = c(FALSE, FALSE), PhysicalElementNo = 1:2,
PhysicalFirstCol = c(1L, 1L), PhysicalLastCol = c(1L, 1L)), .Names = c("VirtualVmode",
"AsIs", "VirtualIsMatrix", "PhysicalIsMatrix", "PhysicalElementNo",
"PhysicalFirstCol", "PhysicalLastCol"), row.names = c("ID", "time"
), class = "data.frame", Dim = c(9L, 2L), Dimorder = 1:2), physical = structure(list(
ID = structure(list(), physical = <pointer: 0x000000000821ab20>, virtual = structure(list(), Length = 9L, Symmetric = FALSE), class = c("ff_vector",
"ff")), time = structure(list(), physical = <pointer: 0x000000000821abb0>, virtual = structure(list(), Length = 9L, Symmetric = FALSE, Levels = c("10/17/2003 0:01",
"12/5/1999 0:02", "2/1/2000 0:01", "3/23/1998 0:01", "3/24/2013 0:00",
"5/29/2004 0:00", "5/9/1985 0:01", "6/14/2010 0:01", "6/25/2008 0:02"
), ramclass = "factor"), class = c("ff_vector", "ff"))), .Names = c("ID",
"time")), row.names = NULL), .Names = c("virtual", "physical",
"row.names"), class = "ffdf")
請提供數據的一小樣品與dput的'的輸出(頭(數據))' – 2014-10-18 16:56:33
對於因子轉換,你需要先在列上做一個'as.character'。然後你可以將它傳遞給'as.POSIXct'。 – hrbrmstr 2014-10-18 17:26:52
似乎應用as.character後,列仍然是因子類別。我認爲問題是ff不支持字符....也許我錯了... – 2014-10-18 17:48:54