2016-04-25 31 views
2

我的數據看起來像表如何使顯示其更改時間時間r中

ID Joint_time leave_time group 
1 201501  201603  2 
2 201508  201601  2 
3 201503  201601  2 
4 201512  201601  3 
5 201511  201602  2 
6 201503  .   1 
7 201503  .   1 
8 201506  201602  3 
9 201507  .   1 
10 201503  .   1 
11 201601  201602  2 
12 201601  .   1 
13 201601  201603  2 
14 201601  201602  3 
15 201601  201602  3 
16 201602  .   1 
17 201602  .   1 
18 201602  201603  3 
19 201602  .   1 
20 201602  .   1 
21 201602  .   1 
22 201603  .   1 
23 201603  .   1 
24 201603  .   1 
25 201603  .   1 
26 201603  .   1 
27 201603  .   1 
28 201603  .   1 

我想知道在每個月底的變化,總的客戶號。我想演示離開和加入的客戶編號。我只知道使用table()。但是這段代碼似乎並沒有處理這種複雜的表格。 我的數據是如下

ID<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28) 
Joint_time<-c("201501","201508","201503","201512","201511","201503","201503","201506","201507","201503","201601","201601","201601","201601","201601","201602","201602","201602","201602","201602","201602","201603","201603","201603","201603","201603","201603","201603") 
leave_time<-c("201603","201601","201601","201601","201602",".",".","201602",".",".","201602",".","201603","201602","201602",".",".","201603",".",".",".",".",".",".",".",".",".",".") 
group<-c(2,2,2,3,2,1,1,3,1,1,2,1,2,3,3,1,1,3,1,1,1,1,1,1,1,1,1,1) 
question_table<-data.frame(ID,Joint_time,leave_time,group) 

我想建一個表如下

           201601 201602 201603 
Total number in month beginning     10  12  13 
Joint this month         5  6  7 
Group 2 who joint during 2015 leave this month  2  1  1 
Group 2 who joint during 2016 leave this month  0  1  1 
Group 3 who joint during 2015 leave this month  1  1  0 
Group 3 who joint during 2016 leave this month  0  2  1 
Total number in month end       12  13  17 

回答

0

我要去幫助需要的輸出的每個部分,因爲我不相信這是一個把這種格式的所有數據放在一個單一的數據框中是個好主意。如果您確實需要這種格式,我可以編輯答案。

要計算來自不同羣體的人數和參加年離開的時候,你可以使用dplyrtidyr封裝的組合如下:

library(dplyr) 
library(tidyr) 
question_table %>% 
    filter(leave_time != '.') %>% 
    mutate(Joint_year = substr(Joint_time, 1, 4)) %>% 
    group_by(group, leave_time, Joint_year) %>% 
    summarise(left = n()) %>% 
    spread(leave_time, left, fill = 0) 

返回的輸出如下:

Source: local data frame [4 x 5] 
Groups: group [2] 

    group Joint_year 201601 201602 201603 
    (dbl)  (chr) (dbl) (dbl) (dbl) 
1  2  2015  2  1  1 
2  2  2016  0  1  1 
3  3  2015  1  1  0 
4  3  2016  0  2  1 

總之,在2016年的每個月中加入了多少人,你可以這樣做:

question_table %>% 
    filter(Joint_time %in% c('201601', '201602', '201603')) %>% 
    group_by(Joint_time) %>% 
    summarise(joined = n()) %>% 
    spread(Joint_time, joined, fill = 0) 

Source: local data frame [1 x 3] 

    201601 201602 201603 
    (dbl) (dbl) (dbl) 
1  5  6  7 

在這種情況下,最好在最後避免spread並保留長格式的數據。但是,這取決於你。

至於在每個週期的開始讓客戶總數的最後一部分,你可以做這樣的事情:

question_table$Joint_time <- as.character(question_table$Joint_time) 
question_table$leave_time <- as.character(question_table$leave_time) 

df <- data.frame(numberBeginning = sapply(sort(unique(question_table$leave_time[question_table$leave_time != '.'])), function(x) nrow(filter(question_table, Joint_time < x, leave_time == '.' | leave_time >= x)))) 

如果你想在寬幅的最後一個,它需要更多一些工作:

df$period <- row.names(df) 
row.names(df) <- NULL 
df <- spread(df, period, numberBeginning) 

    201601 201602 201603 
1  10  12  13 

可以稍微修改上面的代碼來獲得信息的最後一點上結束數如下:

df <- data.frame(numberEnding = sapply(sort(unique(question_table$leave_time[question_table$leave_time != '.'])), function(x) nrow(filter(question_table, Joint_time <= x, leave_time == '.' | leave_time > x)))) 
df$period <- row.names(df) 
row.names(df) <- NULL 
df <- spread(df, period, numberEnding) 
df 
    201601 201602 201603 
1  12  13  17 
+0

非常感謝。我怎樣才能將所有四個「df」垂直合併到一張表中? –

+0

當然,使用'rbind'可能會有所幫助。只是我不喜歡在同一個數據框中保存不同的數據。您可能需要保存爲四個名稱,而不是'df',如上面所用。而且,如果符合您的需求,也許您可​​以投票並接受答案。 – Gopala

+0

運行'df <-data.frame()'代碼後,它顯示「'> ='對於因素沒有意義」。如何解決這個問題呢?謝謝。 –