我試圖對某些數據執行一些初始探索。我正忙於通過將連續變量轉換爲因子並通過頻段計算頻率來分析單向連續變量。使用dplyrXdf將連續變量轉換爲分類
我想與dplyrXdf要做到這一點,但它似乎並沒有工作一樣正常dplyr什麼我試圖
sample_data <- RxXdfData("./data/test_set.xdf") #sample xdf for testing
as_data_frame <- rxXdfToDataFrame(sample_data) #same data as dataframe
# Calculate freq by Buildings Sum Insured band
導入我的樣本數據作爲數據框下面的代碼工作
buildings_ad_fr <- as_data_frame %>%
mutate(bd_cut = cut(BD_INSURED_VALUE, seq(from = 150000, to = 10000000,by = 5000000))) %>%
group_by(bd_cut) %>%
summarise(exposure = sum(BENEFIT_EXPOSURE, na.rm = TRUE),
ad_pd_f = sum(ACT_AD_PD_CLAIM_COUNT)/sum(BENEFIT_EXPOSURE, na.rm = TRUE))
,但我不能使用數據的XDF版本做同樣的事情
buildings_ad_fr_xdf <- sample_data %>%
mutate(bd_cut = cut(BD_INSURED_VALUE, seq(from = 150000, to = 10000000,by = 5000000))) %>%
group_by(bd_cut) %>%
summarise(exposure = sum(BENEFIT_EXPOSURE, na.rm = TRUE),
ad_pd_f = sum(ACT_AD_PD_CLAIM_COUNT)/sum(BENEFIT_EXPOSURE, na.rm = TRUE))
我能想到的解決方法是使用rxDataStep通過在變換參數中傳遞bd_cut = cut(BD_INSURED_VALUE, seq(from = 150000, to = 10000000,by = 5000000))
來創建新列,但不應有必要執行中間步驟。
我使用.rxArgs功能group_by
表達式之前嘗試,但也似乎不工作
buildings_ad_fr <- sample_data %>%
mutate(sample_data,.rxArgs = list(transforms = list(bd_cut = cut(BD_INSURED_VALUE,
seq(150000,
10000000,
5000000)))))%>%
group_by(bd_cut) %>%
summarise(exposure = sum(BENEFIT_EXPOSURE, na.rm = TRUE),
ad_pd_f = sum(ACT_AD_PD_CLAIM_COUNT)/sum(BENEFIT_EXPOSURE, na.rm = TRUE))
這兩次的XDF文件時,它給人的錯誤現在Error in summarise.RxFileData(., exposure = sum(BENEFIT_EXPOSURE, na.rm = TRUE),: with xdf tbls only works with named variables, not expressions
我知道這個包可以分解變量,但我不知道如何使用它來分割連續變量
有誰知道如何做到這一點?
再次感謝!你的解決方案工作 - 並在dplyrxdf真棒 - 我要馬上更新! –
以下是關於dplyrXdf 0.10新功能的[博客文章](http://blog.revolutionanalytics.com/2017/08/dplyrxdf-0100-beta-prerelease.html)。 –