2012-11-21 73 views
1

我想將Linux系統上每個子目錄中的文件數量彙總到Excel表格中。Python將每個子目錄中的文件數量輸出到csv文件

該目錄一般設置爲:maindir/person/task/somedata/files。 但是,設置的子目錄有所不同(即,某些文件可能沒有'task'目錄),所以我需要讓python遍歷文件路徑。

我的問題是我需要從'person'的所有子目錄名稱,目前我的代碼(下面)只附加最近的目錄和文件數量。如果任何人都可以幫助我解決這個問題,將不勝感激!

import os, sys, csv 

outwriter = csv.writer(open("Subject_Task_Count.csv", 'w')) 

dir_count=[] 
os.chdir('./../../') 
rootDir = "./" # set the directory you want to start from 
for root, dirs, files in os.walk(rootDir): 
for d in dirs: 
    a = str(d) 
    count = 0 
    for f in files: 
     count+=1 
    y= (a,count) 
    dir_count.append(y) 

for i in dir_count: 
    outwriter.writerow(i) 

回答

0

我不清楚你的問題,你可能想重新閱讀os.walk文檔。 root是正在遍歷的當前目錄。 dirs是立即在root的子目錄,而files是直接在root中的文件。由於您的代碼現在可以計算相同的文件(來自根目錄)並將其記錄爲每個子目錄中的文件數量。

這就是我想出來的。希望它接近你想要的。如果沒有,則修改:)它會打印一個目錄,目錄中的文件數量以及目錄及其所有子目錄中的文件數量。

import os 
import csv 

# Open the csv and write headers. 
with open("Subject_Task_Count.csv",'wb') as out: 
    outwriter = csv.writer(out) 
    outwriter.writerow(['Directory','FilesInDir','FilesIncludingSubdirs']) 

    # Track total number of files in each subdirectory by absolute path 
    totals = {} 

    # topdown=False iterates lowest level (leaf) subdirectories first. 
    # This way I can collect grand totals of files per subdirectory. 
    for path,dirs,files in os.walk('.',topdown=False): 
     files_in_current_directory = len(files) 

     # Start with the files in the current directory and compute a 
     # total for all subdirectories, which will be in the `totals` 
     # dictionary already due to topdown=False. 
     files_including_subdirs = files_in_current_directory 
     for d in dirs: 
      fullpath = os.path.abspath(os.path.join(path,d)) 

      # On my Windows system, Junctions weren't included in os.walk, 
      # but would show up in the subdirectory list. this try skips 
      # them because they won't be in the totals dictionary. 
      try: 
       files_including_subdirs += totals[fullpath] 
      except KeyError as e: 
       print 'KeyError: {} may be symlink/junction'.format(e) 

     totals[os.path.abspath(path)] = files_including_subdirs 
     outwriter.writerow([path,files_in_current_directory,files_including_subdirs]) 
+0

非常感謝你的幫助,這一個完美的工作,對不起,我沒有機會檢查,直到今天。 – user1843473

3

你應該嘗試沿着線的東西:

for root,dirs,files in os.walk(rootDir) : 
    print root, len(files) 

它打印子目錄和文件的數量。

相關問題