確定數據幀的列的數據類型

113

你最好的選擇，開始數據幀的名字是使用?str()。要探索一些例子，讓我們做一些數據：

set.seed(3221) # this makes the example exactly reproducible 
my.data <- data.frame(y=rnorm(5), 
         x1=c(1:5), 
         x2=c(TRUE, TRUE, FALSE, FALSE, FALSE), 
         X3=letters[1:5])

@WilmerËHenao H公司的解決方案是非常精簡：

sapply(my.data, class) 
     y  x1  x2  X3 
"numeric" "integer" "logical" "factor"

使用str()讓你的信息加上額外的好東西（如水平的因素和每個變量的前幾個值）：

str(my.data) 
'data.frame': 5 obs. of 4 variables: 
$ y : num 1.03 1.599 -0.818 0.872 -2.682 
$ x1: int 1 2 3 4 5 
$ x2: logi TRUE TRUE FALSE FALSE FALSE 
$ X3: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

@Gavin Simpson的方法也簡化了，但提供的信息略有不同class()：

sapply(my.data, typeof) 
     y  x1  x2  X3 
"double" "integer" "logical" "integer"

更多有關class，typeof，中間的孩子，mode，看到這個優秀的SO線程：A comprehensive survey of the types of things in R. 'mode' and 'class' and 'typeof' are insufficient。

來源

2014-01-14 22:55:31 gung

+1

使用R幾個月後，我發現'str（dataframe）'是一目瞭然確定列類型的最快方法。其他方法需要更多按鍵，並且不會顯示儘可能多的信息，但如果列數據類型是其他函數的輸入，則這些方法很有用。 – stackoverflowuser2010

+0

是否有選項將前導零視爲重要？例如，這通常發生在郵政編碼和編碼列上 - 在某些情況下，我們希望將它們保留爲「字符」而不是「整數」。例如，將用於與其他數據框連接的郵政編碼和查找代碼等列必須保持前導零。 – Tagar

+1

@Tagar，如果我跟着你，那沒有什麼可做的w /'str（）';只需要確保數據類型是你想要的。據推測，當數據被讀取時，前導0被剝離。在'read.table（）'中，有一個'colClasses'參數可以用來將給定的變量設置爲'character'。 – gung

30

sapply(yourdataframe, class)

哪裏yourdataframe是你使用

來源

2014-01-14 22:24:35

+3

還有'str' ... –

9

，如果你需要的實際類型的向量數據幀我建議

sapply(foo, typeof)

。 class()是有點不同的野獸。

如果您不需要將此信息作爲一個向量（即，您不需要它以後以編程方式執行其他操作），則只需使用str(foo)即可。

在這兩種情況下，foo都將替換爲您的數據框的名稱。

來源

2014-01-14 22:57:33

2

這是helpRFunctions軟件包的一部分，它將返回數據框中所有各種數據類型的列表以及與該類型關聯的特定變量名稱。

install.package('devtools') # Only needed if you dont have this installed. 
library(devtools) 
install_github('adam-m-mcelhinney/helpRFunctions') 
library(helpRFunctions) 
my.data <- data.frame(y=rnorm(5), 
        x1=c(1:5), 
        x2=c(TRUE, TRUE, FALSE, FALSE, FALSE), 
        X3=letters[1:5]) 
t <- list.df.var.types(my.data) 
t$factor 
t$integer 
t$logical 
t$numeric

然後，您可以執行類似var(my.data[t$numeric])的操作。

希望這有幫助！

來源

2014-11-25 23:25:54

+1

值得注意的是，在這種情況下，這是'lapply（your_data，class）'，有一些額外的格式化處理。 – Gregor

3

只需將您的數據幀分爲以下幾個功能：

data_types <- function(frame) { 
    res <- lapply(frame, class) 
    res_frame <- data.frame(unlist(res)) 
    barplot(table(res_frame), main="Data Types", col="steelblue", ylab="Number of Features") 
}

產生的所有數據類型的數據幀中的情節。對於虹膜數據集，我們得到如下：

來源

2016-12-27 23:54:13 Cybernetic

2

既然沒說清楚，我只補充一點：

我一直在尋找一種方式來創建一個表這持有所有數據類型出現次數。

說我們有一個data.frame有兩個數字，一個邏輯列

dta <- data.frame(a = c(1,2,3), 
        b = c(4,5,6), 
        c = c(TRUE, FALSE, TRUE))

可以總結每種數據類型的列數與

table(unlist(lapply(dta, class))) 
# logical numeric 
#  1  2

這都非常方便，如果你有很多專欄，想要快速瀏覽一下。

給予積分：該解決方案的啓發是the answer of @Cybernetic。

來源

2017-08-18 11:36:37 loki

確定數據幀的列的數據類型

回答

相關問題