6
我有以下數據框,我的意圖是找到所有的ID,具有不同的用法,但相同的類型。multidplyr和group_by()和過濾器()
ID <- rep(1:4, each=3)
USAGE <- c("private","private","private","private",
"taxi","private","taxi","taxi","taxi","taxi","private","taxi")
TYPE <- c("VW","VW","VW","VW","MER","VW","VW","VW","VW","VW","VW","VW")
df <- data.frame(ID,USAGE,TYPE)
如果我運行
df %>% group_by(ID, TYPE) %>% filter(n_distinct(USAGE)>1)
我得到預期的結果。但我的原始數據幀有> 200萬行。所以我想用我所有的核心來運行這個操作。
我想這個代碼multidplyr:
f1 <- partition(df, ID)
f2 <- f1 %>% group_by(ID, TYPE) %>% filter(n_distinct(USAGE)>1)
f3 <- collect(f2)
但隨後出現以下消息:
Warning message: group_indices_.grouped_df ignores extra arguments
後
f1 <- partition(df, ID)
和
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
4 nodes produced errors; first error: Evaluation error: object 'f1' not found.
f2 <- f1%>% group_by(ID, TYPE) %>% filter(f1, n_distinct(USAGE)>1)
會是什麼來實現整個操作進入multidplyr正確的方法後?非常感謝。