創建數據幀與其中R

行特定數量我有一個大的JSON文件，其大小爲更然後2GB。由於數據量非常大，我無法使用整個數據集創建數據幀。我想解析特定信息並寫入CSV文件。創建數據幀與其中R

所以我要尋找一些方法來創建數據幀與排的具體數量。

假設我有2M行，當我解析我的json到數據幀時，我想創建一個數據幀，每個進程只有10k-15k行行。然後在CSV文件中寫入一些信息。

每個進程將有10k-15k行行，直到它完成所有2M行。

我正在與tidyjson和dplyr包。

來源

2017-06-23 Sirajus Salayhin

如何巨大的JSON文件拆分，以較小的的R之外？ – amonk

你能告訴我們，你到目前爲止所嘗試過的嗎？ – loki

我建議這麼大的文件，以較小的分裂去並行：

library(parallel) 
json_files<-list.files(path = "path/to/jsons",pattern="*.json",full.names = TRUE)#get the files' location 

no_cores <- detectCores() - 1 
registerDoParallel(cores=no_cores) 
cl <- makeCluster(no_cores) 

system.time(json_list<-parLapply(cl,json_files,function(x) rjson::fromJSON(file=x,method = "R"))) 

    stopCluster(cl)#Once we are done we need to close the cluster so that resources such as memory are returned to the operating system. 
    gc()#just a garbage collection call.

你現在有保存整個進口信息的列表。

來源

2017-06-23 09:33:32 amonk

創建數據幀與其中R

回答

相關問題