遞歸讀取和打印csv文件的狀態

我是R的初學者，最近已經從STATA轉換到R.因此，這是一場艱苦的戰鬥。我可以寫一個矢量化的命令來遞歸讀取csv文件，如Sapply vs. Lapply while reading files with factors所述。這裏是我的代碼：遞歸讀取和打印csv文件的狀態

filenames<-list.files(path="~/Documents/R Programming/Data/",pattern=".csv") 
appended_filename<-sapply(filenames, function(x) paste("~/Documents/R Programming/Data/",x,sep = "")) 

Merged_file<-do.call(rbind,lapply(appended_filename,read.csv))

但是，我有大約50多個文件。我面臨的挑戰是，我無法知道閱讀任何文件是否存在問題。有沒有什麼方法可以打印狀態，如"1 2 ..."（我不是在尋找任何漂亮的東西......只是對發生了什麼事情進行更新），只是爲了知道有多少文件已被讀取？

我是一個初學者，所以我不知道如何添加一個函數，這將顯示我在這方面的一些知名度。作爲回退選項，在運行上述命令之前，我手動編碼了read.csv()函數來測試和檢查每個文件，最後是rbind()函數。這是非常痛苦的。

來源

2016-09-28 watchtower

您可以檢查http://stackoverflow.com/questions/12193779/how-to-write-trycatch-in-r – akrun

您可以在樂譜中使用匿名函數，就像在上面的sapply中一樣。然後在這個函數中，你可以打印出文件名，讀入文件名，做任何你想做的事。因此，而不是lapplying read.csv每個appended_filename，你可以做這樣的事情：

do.call(rbind, lapply(appended_filename, function(x) {print(x); read.csv(x)}))

您也可以使用該方法rbind.fill（在plyr庫）相結合dataframes的列表。這比do.call清潔一點。

rbind.fill(lapply(appended_filename, function(x) {print(x); read.csv(x)}))

來源

2016-09-28 16:53:28 jdoubleyou

這是非常棒的迴應。一個簡單的問題：你能解釋一下你的迴應的最後一部分......「你也可以使用rbind.fill（在plyr庫中）來組合一個數據框列表，這比do.call清潔一點。「我不太清楚「清潔」是什麼意思。我真誠地感謝你的想法。我是初學者，所以如果這個問題太天真，我很抱歉。 – watchtower

這是個人偏好，但從文檔中，'do.call構造並執行函數調用...'。這使它成爲一種元函數，它將函數和列表作爲參數。在這種方法中，你需要使用兩個函數（do.call和lapply）。 rbind.fill將該特定功能（綁定數據幀列表）包裝爲1個函數。使用rbind.fill更直接，因爲你只需要調用這個函數。我認爲值得您花時間瞭解plyr和dplyr的功能和理念。試試創作者的書，[用於數據科學的R]（http://r4ds.had.co.nz/）。 – jdoubleyou

進度條可能是一個更好的方式去：

library(purrr) 
library(dplyr) 

td <- tempdir() 

# Make 100 copies of mtcars in a temporary directory 
walk(1:100, ~write.csv(mtcars, file.path(td, sprintf("mtcars%02d.csv", .)), row.names=FALSE)) 

# Get a list of the files. dir() == list.files(), just shorter 
fils <- dir(td, pattern=".csv", full.names=TRUE) 

# Inspect the list 
head(fils) 
## [1] "/var/folders/3r/zg9pcxys4dqg4j7_bqbn3c0h0000gn/T//RtmpW0AVZ2/mtcars01.csv" 
## [2] "/var/folders/3r/zg9pcxys4dqg4j7_bqbn3c0h0000gn/T//RtmpW0AVZ2/mtcars02.csv" 
## [3] "/var/folders/3r/zg9pcxys4dqg4j7_bqbn3c0h0000gn/T//RtmpW0AVZ2/mtcars03.csv" 
## [4] "/var/folders/3r/zg9pcxys4dqg4j7_bqbn3c0h0000gn/T//RtmpW0AVZ2/mtcars04.csv" 
## [5] "/var/folders/3r/zg9pcxys4dqg4j7_bqbn3c0h0000gn/T//RtmpW0AVZ2/mtcars05.csv" 
## [6] "/var/folders/3r/zg9pcxys4dqg4j7_bqbn3c0h0000gn/T//RtmpW0AVZ2/mtcars06.csv" 

# Use a progress bar based on total # of files to read 
pb <- progress_estimated(length(fils)) 

map_df(fils, function(x) { # map_df will automagically append all the data frames together 
    pb$tick()$print()   # increment the progress bar 
    read.csv(x) 
}) -> df 

# see what we've got 
glimpse(df) 
## Observations: 3,200 
## Variables: 11 
## $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.... 
## $ cyl <int> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, ... 
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 1... 
## $ hp <int> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, ... 
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.9... 
## $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3... 
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 2... 
## $ vs <int> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, ... 
## $ am <int> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ... 
## $ gear <int> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, ... 
## $ carb <int> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, ... 

# cleanup those files 
walk(fils, unlink)

來源

2016-09-28 18:17:23 hrbrmstr

遞歸讀取和打印csv文件的狀態

回答

相關問題