2016-01-03 39 views
3

因素是R中的一種向量,其元素爲 也可以排序的分類值。這些值在內部存儲爲 作爲帶有標記級別的整數。Julia對R因子概念的解決方案是什麼?

# In R: 
> x = c("high" , "medium" , "low" , "high" , "medium") 

> xf = factor(x) 
> xf 
[1] high  medium low  high  medium 
Levels: high low medium 

> as.numeric(xf) 
[1] 1 3 2 1 3 

> xfo = factor(x , levels=c("low","medium","high") , ordered=TRUE) 
> xfo 
[1] high  medium low  high  medium 
Levels: low < medium < high 

> as.numeric(xfo) 
[1] 3 2 1 3 2 

我檢查Julia documentation和約翰·邁爾斯·懷特的Comparing Julia and R’s Vocabularies(可能是obsolote) - 似乎沒有這樣的概念factor。是否經常使用因素,茱莉亞解決這個問題的方法是什麼?

回答

3

DataFrames包中的PooledDataArray是對應於R的因素的一種可能的備選方案。下面實現使用它你的榜樣:

julia> using DataFrames # install with Pkg.add(DataFrames) if required 

julia> x = ["high" , "medium" , "low" , "high" , "medium"]; 

julia> xf = PooledDataArray(x) 
5-element DataArrays.PooledDataArray{ASCIIString,UInt32,1}: 
"high" 
"medium" 
"low" 
"high" 
"medium" 

julia> xf.refs 
5-element Array{UInt32,1}: 
0x00000001 
0x00000003 
0x00000002 
0x00000001 
0x00000003 

julia> xfo = PooledDataArray(x,["low","medium","high"]); 

julia> xfo.refs 
5-element Array{UInt32,1}: 
0x00000003 
0x00000002 
0x00000001 
0x00000003 
0x00000002 
0

CategoricalArrays.jlCategoricalArray類似因素。