2017-06-28 53 views
0

我有大量的文件夾包含每個文件夾下的csv和htm文件(一些文件夾有多個csv文件,一些文件只有一個csv文件)。如何選擇包含大量文件夾中只有一個CSV文件的文件夾?

是否可以自動屏蔽並獲得只有一個csv文件的文件夾並將數據導入R或其他統計軟件包?

+0

如果CVS文件要閱讀有一個共同的模式,你可以使用'list.files'和'pattern'參數 –

回答

0
getwd() 
all_files<-list.files() 
split_all_files<-sapply(all_files,function(x) strsplit(x, "\\.")[1]) 

for(i in seq(1,length(all_files))){ 

    if(split_all_files[[i]][2]=="csv"){ 
    data_file<-data.frame() 
    data_file<-read.csv(all_files[i]) 

    } 
} 
0

的OP請求搜索所有目錄中csv文件,但只考慮那些包含正好一個csv文件目錄。只有這些文件應該被導入。

在UNIX系統上,也有像fgrep這大概可以用於此目的的操作系統命令,但我相信下面的基礎R解決方案應該在任何系統上工作:

# define starting dir 
path <- file.path("path", "to", "start", "search") 
# or path <- file.path(".") 
# or path <- getwd() 

# find all directories, recursively, i.e., also sub-directories 
dirs <- list.dirs(path, recursive = TRUE) 

# search all directories for csv files, i.e., file name is ending with csv 
# return result as a list with a vector of file names per list element 
csv_files <- lapply(dirs, list.files, pattern = "\\.csv$", full.names = TRUE) 

# pick only those list elements which contain exactly one .csv file 
# and unlist to get vector of file names. 
# note lenghts() gets the length of each element of a list 
files_to_read <- unlist(csv_files[lengths(csv_files) == 1L]) 

# read selected files, return result in a list 
imported <- lapply(files_to_read, data.table::fread) 
# or use a different file reader, alternatively 
imported <- lapply(files_to_read, readr::read_csv) 

# name list elements to identify imported data sets 
setNames(imported) <- files_to_read 
# or use only the file name 
setNames(imported) <- basename(files_to_read) 
# or use only the name of the enclosing directory 
setNames(imported) <- basename(dirname(files_to_read)) 
相關問題