2015-12-30 151 views
0

enter image description here從長格式轉換爲寬格式

想要將數據從長格式轉換爲寬格式。 ColA總體上只需要一行。在ColB中ColB會出現重複,在這種情況下,我試圖通過計數來聚合它。 ColF由sum()彙總。

s <- read_csv("sample.csv") 
s_1 <- subset(s, select=c("ColA", "ColF")) 
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum) 
head(grp_by) 

不知道如何將列

更新的其餘部分:基礎上建議利用reshape2包

library(reshape2) 

s <- read_csv("sample.csv") 
s_1 <- subset(s, select=c("ColA", "ColF")) 
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum) 

s2 <- dcast(s, ColA ~ ColB) 
s3 <- dcast(s, ColA ~ ColC) 
s4 <- dcast(s, ColA ~ ColD) 
s5 <- dcast(s, ColA ~ ColE) 

print(s2) 
print(s3) 
print(s4) 
print(s5) 
print(grp_by) 

這是這些打印語句的輸出。

enter image description here

我怎麼能合併所有這些到一個數據幀?我的實際數據集是100萬條記錄 - 這個代碼是否足夠優化以便在其上運行,或者有更好的寫入方式。感謝你的幫助。

+3

看看這裏http://stackoverflow.com/questions/ 5890584 /重塑數據 - 從長到寬格式 - r –

+0

@DavidArenburg Than ks爲您的建議。在使用reshape2之後更新了問題。你能否再次檢查這個問題並適當地指導我。謝謝。 – prasanth

+1

在這裏看到如何提供一個可重複的例子和所需的輸出http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example –

回答

0

這是我用來轉換和合並數據的示例代碼。可能有更好的方法,但這是我能想到的最好的方法。

# Include needed libraries 
library(reshape2) 

# Load the sample data 
s <- read_csv("sample.csv") 

# Aggregate ColF by SUM for each ColA 
s_1 <- subset(s, select=c("ColA", "ColF")) 
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum) 

# Long to Wide format 
s2 <- dcast(s, ColA ~ ColB) 
s3 <- dcast(s, ColA ~ ColC) 
s4 <- dcast(s, ColA ~ ColD) 
s5 <- dcast(s, ColA ~ ColE) 

# But this is the crude way of removing NA columns which I used! 
# Rename the NA column into something so that it can be removed by assigning NULL!! 
colnames(s2)[7] <- "RemoveMe" 
colnames(s3)[5] <- "RemoveMe" 
colnames(s4)[5] <- "RemoveMe" 
colnames(s5)[4] <- "RemoveMe" 

s2$RemoveMe <- NULL 
s3$RemoveMe <- NULL 
s4$RemoveMe <- NULL 
s5$RemoveMe <- NULL 

# Merge all pieces to form the final transformed data 
s2 <- merge(x = s2, y = s3, by="ColA", all = TRUE) 
s2 <- merge(x = s2, y = s4, by="ColA", all = TRUE) 
s2 <- merge(x = s2, y = s5, by="ColA", all = TRUE) 
s2 <- merge(x = s2, y = grp_by, by="ColA", all = TRUE) 

# Removing the row with user_id = NA!! 
s2 <- s2[-c(4), ] 

# Final transformed data 
print(s2) 

使用這些作爲參考:

  1. dcast - How to reshape data from long to wide format?
  2. 合併 - How to join (merge) data frames (inner, outer, left, right)?
相關問題