我想創建一個基於另一個數據框中的信息的數據幀。這個r代碼爲什麼這麼慢?
第一數據框(base_mar_bop)的數據,如:
201301|ABC|4
201302|DEF|12
我的願望是在它創建從該數據幀與16行:
4 times: 201301|ABC|1
12 times: 201302|DEF|1
我寫了一個腳本,需要長時間運行。爲了得到一個想法,最終的數據幀有大約200萬行,源數據幀大約有10k行。由於數據的機密性,我無法發佈數據幀的源文件。
因爲它經歷了千百年來運行這段代碼,我決定做這在PHP和它一分鐘內跑了,並得到了工作完成後,將其寫入到一個txt文件,然後在R.
導入txt文件我不知道爲什麼R需要這麼長時間..是否調用函數?它是嵌套for循環嗎?從我的角度來看,那裏沒有那麼多計算密集的步驟。
# first create an empty dataframe called base_eop that will each subscriber on a row
identified by CED, RATEPLAN and 1
# where 1 is the count and the sum of 1 should end up with the base
base_eop <-base_mar_bop[1,]
# let's give some logical names to the columns in the df
names(base_eop) <- c('CED','RATEPLAN','BASE')
# define the function that enables us to insert a row at the bottom of the dataframe
insertRow <- function(existingDF, newrow, r) {
existingDF[seq(r+1,nrow(existingDF)+1),] <- existingDF[seq(r,nrow(existingDF)),]
existingDF[r,] <- newrow
existingDF
}
# now loop through the eop base for march, each row contains the ced, rateplan and number of subs
# we need to insert a row for each individual sub
for (i in 1:nrow(base_mar_eop)) {
# we go through every row in the dataframe
for (j in 1:base_mar_eop[i,3]) {
# we insert a row for each CED, rateplan combination and set the base value to 1
base_eop <- insertRow(base_eop,c(base_mar_eop[i,1:2],1),nrow(base_eop))
}
}
# since the dataframe was created using the first row of base_mar_bop we need to remove this first row
base_eop <- base_eop[-1,]
你會好得多提前定義整個數據框,然後填寫它而不是附加行。我認爲這是在Pat Burns的「R Inferno」中討論的。另外考慮使用'data.table'包進行這種大型操作。 – 2013-04-24 21:45:36
提供了一個小的(真的很小,你可以放在上面的代碼中)可重現的示例數據集 – eddi 2013-04-24 21:45:42
如果輸出示例中的第二行是'201302 | DEF | 1'(即1而不是12)? – 2013-04-24 21:46:42