2012-11-09 48 views
0

我有一個字符串向量,每個字符串都是一個csv的id列表。 我想將每個字符串拆分成一個列表,並將ID的長度和集合存儲爲數據框中的兩個新列。這裏有一個例子:向數據框添加一個向量值列 - 總結(df)

df = data.frame(ids = c("a,b,c", "d", "e", "", "f,g", "", "h", "i", ""), stringsAsFactors=FALSE) 
ids = sapply(df$ids, function (s) unlist(strsplit(as.character(s), ","))) 
df$num.ids = sapply(ids, length) 
df$ids.vec = sapply(ids, unlist) 

這看起來不錯迄今:

> df 
    ids num.ids ids.vec 
1 a,b,c  3 a, b, c 
2  d  1  d 
3  e  1  e 
4    0   
5 f,g  2 f, g 
6    0   
7  h  1  h 
8  i  1  i 
9    0  

但是當我鍵入摘要(DF),我得到ids.vec神祕列。更重要的是, 摘要不會計算摘要,但會列出每一行(將此應用於我的真實數據集時,這是個問題)。

> summary(df) 
     ids    num.ids ids.vec.Length ids.vec.Class ids.vec.Mode 
Length:9   Min. :0 3   -none-  character    
Class :character 1st Qu.:0 1   -none-  character    
Mode :character Median :1 1   -none-  character    
        Mean :1 0   -none-  character    
        3rd Qu.:1 2   -none-  character    
        Max. :3 0   -none-  character    
           1   -none-  character    
           1   -none-  character    
           0   -none-  character 

任何想法我做錯了什麼?

謝謝! Kevin

+1

完全是,是​​你期待什麼的一部分?您已將數據列添加到列表中,而不是原子矢量。這將使認爲看起來有點「怪異」。 – joran

回答

0

你沒有做錯任何事情。正如@joran提到的那樣,問題實際上是你期望從summary()得到什麼信息?

你們看到的是二概要的組合:

# df1 is df less ids.vec; df2 is only ids.vec 
df1 <- df[,names(df) != "ids.vec"] 
df2 <- df[,names(df) == "ids.vec"] 

> summary(df1) # summary for a data frame 
    ids    num.ids 
Length:9   Min. :0 
Class :character 1st Qu.:0 
Mode :character Median :1 
        Mean :1 
        3rd Qu.:1 
        Max. :3 

> summary(df2) # summary for a list 
     Length Class Mode  
a,b,c 3  -none- character 
d  1  -none- character 
e  1  -none- character 
     0  -none- character 
f,g 2  -none- character 
     0  -none- character 
h  1  -none- character 
i  1  -none- character 
     0  -none- character 

合併摘要的格式是一個有點尷尬。

注意,只有在列表的整個內容摘要一列

> colnames(summary(df)) 
[1] " ids"          
[2] " num.ids"         
[3] "ids.vec.Length ids.vec.Class ids.vec.Mode" 

還要注意DF2是一個列表。

> str(df2) 
List of 9 
$ a,b,c: chr [1:3] "a" "b" "c" 
$ d : chr "d" 
$ e : chr "e" 
$  : chr(0) 
$ f,g : chr [1:2] "f" "g" 
$  : chr(0) 
$ h : chr "h" 
$ i : chr "i" 
$  : chr(0) 

這是原始數據框

> str(df) 
'data.frame': 9 obs. of 3 variables: 
$ ids : chr "a,b,c" "d" "e" "" ... 
$ num.ids: int 3 1 1 0 2 0 1 1 0 
$ ids.vec:List of 9 
    ..$ a,b,c: chr "a" "b" "c" 
    ..$ d : chr "d" 
    ..$ e : chr "e" 
    ..$  : chr 
    ..$ f,g : chr "f" "g" 
    ..$  : chr 
    ..$ h : chr "h" 
    ..$ i : chr "i" 
    ..$  : chr 
相關問題