從不同目錄讀取多個.csv文件到熊貓DataFrame

我的DataFrame有一個索引SubjectID，並且每個Subject ID都有它自己的目錄。在每個主題目錄中都有一個.csv文件，其中包含要放入我的DataFrame中的信息。使用我的SubjectID索引，我想讀取每個主題的.csv文件的標題，並將其放入我的DataFrame中的新列。從不同目錄讀取多個.csv文件到熊貓DataFrame

除了個別主題號碼，每個主題目錄具有相同的路徑。

我已經找到了從單個目標目錄中讀取多個.csv文件到熊貓數據框的方法，但不能從多個目錄中讀取。下面是一些代碼，我有一種從目標目錄導入多個.csv檔案來：

subject_path = ('/home/mydirectory/SubjectID/') 
filelist = [] 
os.chdir('subject_path') 
for files in glob.glob("*.csv") : 
    filelist.append(files) 

# read each csv file into single dataframe and add a filename reference column 
df = pd.DataFrame() 
columns = range(1,100) 
for c, f in enumerate(filelist) : 
    key = "file%i" % c 
    frame = pd.read_csv((subject_path + f), skiprows = 1, index_col=0, names=columns) 
    frame['key'] = key 
    df = df.append(frame,ignore_index=True)

我想要做類似的事情，但反覆進入不同主題的目錄，而不是具有單一目標目錄。

編輯：我覺得我要做到這一點使用os不pandas，有沒有使用循環使用os通過多個目錄搜索的方法嗎？

來源

2016-10-03 MScar

上面的代碼是什麼，我已經嘗試了從單一的目錄中導入的.csv，問題是，我不是確定如何適應這個從多個目錄導入文件。 – MScar

也許使用循環並搜索多個主題路徑？ –

我想用'os'來做這個嗎？這看起來不像是可以在熊貓中完成的 – MScar

假設您的主題文件夾位於mydirectory，您可以創建目錄中所有文件夾的列表，然後將csv添加到您的文件列表中。

import os 

parent_dir = '/home/mydirectory' 
subject_dirs = [os.path.join(parent_dir, dir) for dir in os.listdir(parent_dir) if os.path.isdir(os.path.join(parent_dir, dir))] 

filelist = [] 
for dir in subject_dirs: 
    csv_files = [os.path.join(dir, csv) for csv in os.listdir(dir) if os.path.isfile(os.path.join(dir, csv)) and csv.endswith('.csv')] 
    for file in csv_files: 
     filelist.append(file) 

# Do what you did with the dataframe from here 
...

來源

2016-10-03 19:30:56

考慮的os.walk()遞歸方法來讀取所有的目錄和文件自上而下（默認= TRUE）或自下而上。另外，您可以使用regex來檢查名稱以專門爲.csv文件進行過濾。

下面將從目標根目錄/home/mydirectory中導入任何子/孫文件夾中的所有csv文件。所以，一定要檢查是否存在非受CSV文件，否則相應的調整re.match()：

import os, re 
import pandas as pd 

# CURRENT DIRECTORY (PLACE SCRIPT IN /home/mydirectory) 
cd = os.path.dirname(os.path.abspath(__file__)) 

i = 0 
columns = range(1,100) 
dfList = [] 

for root, dirs, files in os.walk(cd): 
    for fname in files: 
     if re.match("^.*.csv$", fname): 
      frame = pd.read_csv(os.path.join(root, fname), skiprows = 1, 
           index_col=0, names=columns) 
      frame['key'] = "file{}".format(i) 
      dfList.append(frame)  
      i += 1 

df = pd.concat(dfList)

來源

2016-10-03 19:51:43 Parfait

從不同目錄讀取多個.csv文件到熊貓DataFrame

回答

相關問題