2015-09-02 335 views
1

我想通過包含在同一目錄中不同文件夾中的不同csv文件。我的文件夾位於我的工作目錄中。我的文件夾命名爲:從多個文件讀取多個CSV文件到熊貓DataFrame

folder1, folder2,folder3 

他們每個人都有CSV的具有相同名稱csv1.csv, csv2.csv

我試過這段代碼:

import os 
import re 
import pandas as pd 
from pandas.core.frame import DataFrame 

rootDir = '.' 
for dirName, subdirList, fileList in os.walk(rootDir, topdown=False): 
    print('Found directory: %s' % dirName) 
    for fname in fileList: 
     print('\t%s' % fname) 

     if "csv1.csv" == fname: 
      var= pd.read_csv(fname) 

我可以打印文件夾中的CSV文件的名字,但我得到一個錯誤: IOError: File csv1.csv does not exist
可能是什麼問題呢?

+0

樣子,如果你期待FNAME等於 'csv1.csv' 你不傳遞給read_csv文件的路徑。可能需要像'pd.read_csv(os.join(path,fname))' – postelrich

+0

整個文件夾的路徑或該csv文件的路徑? – user3841581

+0

如果你的工作目錄是'/ home/user',你的文件位於'/ home/user/dir1,/ home/user/dir2 ...',你至少需要文件的相對路徑' DIR2/csv1.csv'。 – postelrich

回答

1

正如你可以在評論中看到的,你必須加入rootDirdirNamefname

import os 
import re 
import pandas as pd 
from pandas.core.frame import DataFrame 

rootDir = '.' 
for dirName, subdirList, fileList in os.walk(rootDir, topdown=False): 
    print('Found directory: %s' % dirName) 
    for fname in fileList: 
     print('\t%s' % fname) 
     filepath = os.path.join(rootDir, dirName, fname) 
     if "csv1.csv" == fname: 
      var = pd.read_csv(filepath) 
      print var.head() 

os.path.join(path, *paths)

Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.

On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo .

+0

非常感謝你;我有一個問題,但它似乎只讀了約7000行,但我的數據包含每個csv大約50000行;這裏有什麼問題? – user3841581

+0

其實很好;我其實問了錯誤的問題;因爲它只保留了我var中最後一個csv五的副本,我實際上想要連接它從目錄中讀取的每個csv的所有列;那是我的var應該包含csv文件 – user3841581

+0

的列,您可以創建空的df並追加一列csv的數據 - http://stackoverflow.com/questions/32352211/sorting-by-groups/32353725#32353725,但您必須更改以添加列'code':'df = pd.DataFrame()'和'df = df.append(g ['code']。tolist(),ignore_index = True)'' – jezrael

相關問題