2016-02-13 70 views
3

我有一個數據幀,如下所示:ř如何將一個數據幀劃分成單列

days target probability 
1 75 0.80 0.9060341 
2 100 0.90 0.75 

df <- structure(list(days = c(75, 100, 120, 150, 200, 300, 75, 100, 
120, 150, 200, 300, 75, 100, 120, 150, 200, 300, 75, 100, 120, 
150, 200, 300, 75, 100, 120, 150, 200, 300, 75, 100, 120, 150, 
200, 300), target = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.9, 0.9, 
0.9, 0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1, 1.05, 1.05, 1.05, 1.05, 
1.05, 1.05, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.2, 1.2, 1.2, 1.2, 
1.2, 1.2), probability = c(0.90603410539241, 0.90603410539241, 
0.90603410539241, 0.90603410539241, 0.90603410539241, 0.904213051602258, 
0.733995206180212, 0.733995206180212, 0.733995206180212, 0.733995206180212, 
0.733995206180212, 0.731795453278156, 0.512082243536284, 0.512082243536284, 
0.512082243536284, 0.512082243536284, 0.512082243536284, 0.511492313399902, 
0.390943562448882, 0.390943562448882, 0.390943562448882, 0.390943562448882, 
0.390943562448882, 0.391451116324459, 0.282452594645645, 0.282452594645645, 
0.282452594645645, 0.282452594645645, 0.282452594645645, 0.283766337160544, 
0.106271449405461, 0.106271449405461, 0.106271449405461, 0.106271449405461, 
0.106271449405461, 0.107778317673786)), .Names = c("days", "target", 
"probability"), class = "data.frame", row.names = c(1L, 2L, 3L, 
4L, 5L, 7L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 20L, 
21L, 23L, 25L, 26L, 27L, 28L, 29L, 31L, 33L, 34L, 35L, 36L, 37L, 
40L, 43L, 44L, 45L, 46L, 47L, 49L)) 

,並希望有一個單一的行中的一個CSV emmited與以下幾部件標頭文件:

day75_target0.80,day100_target0.9等等 - 每行中的值應該只是相應的概率。

想法?

+4

請從'dput'中取出'+'s –

回答

1

通過簡單地串聯字段,然後調換數據幀考慮這個基礎R方法:

# CONCATENATING DAYS AND TARGETS FIELDS 
newdf <- data.frame(daystarget = paste0("day", df$days, "_target", df$target, 
        probability = df$probability), stringsAsFactors=F) 
# ROUND PROBABILITY TO ONE DIGIT 
newdf$probability <- round(as.numeric(newdf$probability), 1) 

# TRANSPOSE DATA FRAME 
finaldf <- data.frame(t(newdf),stringsAsFactors=F)  
# RENAME COLUMNS TO FIRST ROW 
names(finaldf) <- finaldf[1,] 
# REMOVE PREVIOUS FIRST ROW 
finaldf <- finaldf[2,] 
# RESET ROW NAMES 
row.names(finaldf) <- 1:nrow(finaldf) 

write.csv(finaldf, "FinalDF.csv", row.names=F) 

# day75_target0.8 day100_target0.8 day120_target0.8 day150_target0.8 ... 
#1    0.9    0.9    0.9    0.9 ...   
0

這不是做你可憐的數據最有吸引力的事情,但服用它的表面價值。這是很簡單的。

library(tidyverse) 
#first create the columns: 
> df %>% unite(daytarg, days, target, sep = "_target") %>% head 
     daytarg probability 
1 75_target0.8 0.9060341 
2 100_target0.8 0.9060341 
3 120_target0.8 0.9060341 
4 150_target0.8 0.9060341 
5 200_target0.8 0.9060341 
7 300_target0.8 0.9042131 

似乎是合理的檢查,我們將有獨特的列

> df %>% unite(daytarg, days, target, sep = "_target") %>% count(daytarg) %>% filter(n > 1) 
# A tibble: 0 x 2 
# ... with 2 variables: daytarg <chr>, n <int> 

好了,好了。 現在,我們可以添加一個流傳:

> df %>% 
    unite(daytarg, days, target, sep = "_target") %>% 
    spread(daytarg, probability) %>% 
    write_csv("output.csv") 

因此,所有這簡直就是「從所需的列創建所需的名稱」,並把這個名字到使用概率值列。但要小心這樣的事情,你有獨特的組合。