2017-07-27 47 views
1

這是我現在正在做的更多的代碼清理練習。什麼我最初的數據看起來是這樣的:一個管道中的多個整理操作

Year County Town ... Funding Received ... (90+ Variables total) 
2016  a  x    Yes 
2015  a  y    No 
2014  a  x    Yes 
2016  b  z    Yes 

我不看我怎麼可以在這裏查到提交和批准的應用程序的數量,使得我把它轉化爲指針變量與下面的代碼進行計數:

counties <- original_data %>% 
    select(county, funded, year) %>% 
    mutate(
    a=ifelse(county == "a", 1,0), 
    b=ifelse(county == "b", 1,0), 
    c=ifelse(county == "c", 1,0), 
    ... etc ... 
) 

和輸出看起來像

County Funding Received Year binary.a binary.b 
    a    Yes   2016  1   0 
    a    No   2015  1   0 
    b    No   2016  0   1 

這個數據,然後轉換成兩個dataframes(提交和資助),以獲得每個縣的提交和F的計數使用下面的代碼,每年unded應用:

countysum <- counties %>% 
    select(-funded) %>% 
    group_by(county, year) %>% 
    summarise_all(sum, na.rm = T) 

和輸出的樣子:

County Year sum.a sum.b 
    a  2016  32  0 
    a  2015  24  0 
    b  2016  0  16 

但隨後我用了一個整潔的格式來獲取數據的幾個命令:

countysum$submitted <- rowSums(countysum[,3:15, na.rm = T) #3:15 are county indicator vars 
countysum <- countysum[,-c(3:19)] 

現在我的問題是:有沒有辦法將所有這些行爲都歸結爲單一管道?現在我的代碼可以工作,但是希望代碼可以工作,並且更容易遵循。由於缺乏數據而抱歉,我無法分享。

+0

看看'tidyr :: spread' - 我想這就是你在第一部分試圖做的 –

+0

請顯示一個小的可重現的例子。在你的代碼中,有'被投資'的,但在這個例子中,它沒有顯示 – akrun

+0

@akrun我的錯誤,'出資'對應於原文中的「融資」。 – MokeEire

回答

0

我不知道我非常理解你最終的期望輸出是什麼樣子,但我認爲你可以利用邏輯值被強制爲整數並跳過創建虛擬列的事實。

library(dplyr) 

byyear <- original_data %>% 
    group_by(county, year) %>% 
    summarize(
     wasfunded = any(funded == "Yes", na.rm = T) 
    , submittedapplication = any(submittedapp == "Yes", na.rm = T) # I'm assuming did/didn't submit is one of the other variables 
    ) 

# if you don't need the byyear data for something else (I always seem to), 
# you can pipe that straight into this next line 
yrs_funded_by_county <- byyear %>% 
    summarize(
     n_yrs_funded = sum(wasfunded) 
    , n_yrs_submitted = sum(submittedapplication) 
    , pct_awarded = n_yrs_funded/n_yrs_submitted # maybe you don't need a award rate, but I threw it it b/c it's the kind of stuff my grant person cares about 
)