我有格式的數據:轉化觀察到變量
|id|genre1|genre2 |genre3 |
|1 |action|comedy |romance|
|2 |comedy|romance| |
|3 |romance| | |
我想我的數據轉換成爲格式:
|id|action|comedy|romance|
|1 |1 |1 |1 |
|2 |0 |1 |1 |
|3 |0 |0 |1 |
什麼是這樣做的最佳方法是什麼?
我有格式的數據:轉化觀察到變量
|id|genre1|genre2 |genre3 |
|1 |action|comedy |romance|
|2 |comedy|romance| |
|3 |romance| | |
我想我的數據轉換成爲格式:
|id|action|comedy|romance|
|1 |1 |1 |1 |
|2 |0 |1 |1 |
|3 |0 |0 |1 |
什麼是這樣做的最佳方法是什麼?
您可以使用重塑。
library(dplyr)
library(tidyr)
df %>%
gather(number, genre, genre1:genre3) %>%
filter(genre != "") %>%
select(-number) %>%
mutate(one = 1) %>%
spread(genre, one, fill = 0)
隨着基R,可以使用reshape
和table
:
mydf <-data.frame(id=1:3,
genre1=c("action","comedy","romance"),
genre2=c("comedy","romance",NA),
genre3=c("romance",NA,NA))
colnames(mydf)[2:4] <- paste0("genre.",colnames(mydf)[2:4])
m_data <- reshape(mydf,direction="long", varying=2:4)
with(m_data, table(id, genre))
genre
id action comedy romance
1 1 1 1
2 0 1 1
3 0 0 1
假設空元素是空字符串(即,它們不包含任何空格),可以先用NA
取代的那些元件和然後使用reshape2包重新塑造數據。
is.na(df) <- df == ""
library(reshape2)
dcast(melt(df, 1, na.rm = TRUE), id ~ value, length)
# id action comedy romance
# 1 1 1 1 1
# 2 2 0 1 1
# 3 3 0 0 1
或者一行一行的樂趣,保留原始數據不變。
dcast(melt(replace(df, df == "", NA), 1, na.rm = TRUE), id ~ value, length)
# id action comedy romance
# 1 1 1 1 1
# 2 2 0 1 1
# 3 3 0 0 1
原始數據用於:
df <- structure(list(id = 1:3, genre1 = c("action", "comedy", "romance"
), genre2 = c("comedy", "romance", ""), genre3 = c("romance",
"", "")), .Names = c("id", "genre1", "genre2", "genre3"), class = "data.frame", row.names = c(NA,
-3L))