2015-10-17 187 views
0

我有格式的數據:轉化觀察到變量

|id|genre1|genre2 |genre3 | 
|1 |action|comedy |romance| 
|2 |comedy|romance|  | 
|3 |romance|  |  | 

我想我的數據轉換成爲格式:

|id|action|comedy|romance| 
|1 |1  |1  |1  | 
|2 |0  |1  |1  | 
|3 |0  |0  |1  | 

什麼是這樣做的最佳方法是什麼?

回答

1

您可以使用重塑。

library(dplyr) 
library(tidyr) 

df %>% 
    gather(number, genre, genre1:genre3) %>% 
    filter(genre != "") %>% 
    select(-number) %>% 
    mutate(one = 1) %>% 
    spread(genre, one, fill = 0) 
1

隨着基R,可以使用reshapetable

mydf <-data.frame(id=1:3, 
genre1=c("action","comedy","romance"), 
genre2=c("comedy","romance",NA), 
genre3=c("romance",NA,NA)) 

colnames(mydf)[2:4] <- paste0("genre.",colnames(mydf)[2:4]) 
m_data <- reshape(mydf,direction="long", varying=2:4) 
with(m_data, table(id, genre)) 

    genre 
id action comedy romance 
    1  1  1  1 
    2  0  1  1 
    3  0  0  1 
2

假設空元素是空字符串(即,它們不包含任何空格),可以先用NA取代的那些元件和然後使用reshape2包重新塑造數據。

is.na(df) <- df == "" 

library(reshape2) 
dcast(melt(df, 1, na.rm = TRUE), id ~ value, length) 
# id action comedy romance 
# 1 1  1  1  1 
# 2 2  0  1  1 
# 3 3  0  0  1 

或者一行一行的樂趣,保留原始數據不變。

dcast(melt(replace(df, df == "", NA), 1, na.rm = TRUE), id ~ value, length) 
# id action comedy romance 
# 1 1  1  1  1 
# 2 2  0  1  1 
# 3 3  0  0  1 

原始數據用於:

df <- structure(list(id = 1:3, genre1 = c("action", "comedy", "romance" 
), genre2 = c("comedy", "romance", ""), genre3 = c("romance", 
"", "")), .Names = c("id", "genre1", "genre2", "genre3"), class = "data.frame", row.names = c(NA, 
-3L))