2017-01-25 60 views
0

我有兩個數據集:創建基於關閉索引列

# 1. 
user_id users frequency 
1  1  3 
2  1  1 
3  1  1 

# 2. 
user_id sum  unique 
1  2  1 
2  0  0 
3  1  1 

我想在user_id合併,而是基於在一個有序的方式使輸出看起來像1單元有指標,留下user_id出來的圖片:

# 3. 
frequency users sum unique 
3   1  2  1 
1   2  1  1 

有關如何實現此目的的想法?而且,就學習如何進行這些類型的操作而言,他們是這種類型操作的名稱嗎?

回答

2
library(data.table) 
setDT(df)   # this step was to make it a data.table, if its a data.frame 
setDT(df1) 

# logic is : first merge both df's, then group by "frequency" columns 
df[df1][, lapply(.SD, sum), by = .(frequency), .SDcols = c("sum", "unique", "users")] 
# frequency sum unique users 
#1:   3 2  1  1 
#2:   1 1  1  2 
+0

@StuRichards它能回答你的真實數據嗎?我錯過了什麼? –

+0

是df參考數據集1/df1參考數據集2? –

+0

@StuRichards是 –

1

這是一個使用tidyverse的選項。我們可以做兩個數據集,通過「頻率」組合之間的inner_join,我們得到的變量sumsummarise_each

library(dplyr) 
inner_join(df1, df2) %>% 
     group_by(frequency) %>% 
     summarise_each(funs(sum), sum, unique, users) 
# frequency sum unique users 
#  <int> <int> <int> <int> 
#1   1  1  1  2 
#2   3  2  1  1 

或者使用base R,我們merge數據集,做一個aggregate

aggregate(.~frequency, merge(df1, df2)[-1], FUN = sum) 
# frequency users sum unique 
#1   1  2 1  1 
#2   3  1 2  1