2014-12-04 46 views
0

我想從設置文件中動態加載列表/元組。Python從設置文件動態加載元組/列表

我需要編寫一個抓取網站的抓取工具,但我想知道找到的文件,而不是網頁。

我允許用戶在settings.py文件中指定這樣的文件類型,如:

# Document Types during crawling 
textFiles = ['.doc', '.docx', '.log', '.msg', '.pages', '.rtf', '.txt', '.wpd', '.wps'] 
dataFiles = ['.csv', '.dat', '.efx', '.gbr', '.key', '.pps', '.ppt', '.pptx', '.sdf', '.tax2010', '.vcf', '.xml'] 
audioFiles = ['.3g2','.3gp','.asf','.asx','.avi','.flv','.mov','.mp4','.mpg','.rm','.swf','.vob','.wmv'] 


#What lists would you like to use ? 
fileLists = ['textFiles', 'dataFiles', 'audioFiles'] 

導入我的設置文件中的crawler.py

我用beautifulsoup模塊找到從HTML內容鏈接並且過程如下:

for item in soup.find_all("a"): 
      # we dont want some of them because it is just a link to the current page or the startpage 
      if item['href'] in dontWantList: 
       continue 

      #check if link is a file based on the fileLists from the settings 
      urlpath = urlparse.urlparse(item['href']).path 
      ext = os.path.splitext(urlpath)[1] 
      file = False 
      for list in settings.fileLists: 
       if ext in settings.list: 
        file = True 
        #found file link 
        if self.verbose: 
         messenger("Found a file of type: %s" % ext, Colors.PURPLE) 
        if ext not in fileLinks: 
         fileLinks.append(item['href']) 

      #Only add the link if it is not a file 
      if file is not True: 
       links.append(item['href']) 
      else: 
       #Do not add the file to the other lists 
       continue 

以下代碼段引發錯誤:

for list in settings.fileLists: 
       if ext in settings.list: 

顯然是因爲python認爲settings.list是一個列表。

有無論如何告訴python從設置文件中動態查找列表嗎?

+2

不要命名你自己的變量'list',你影響內置。此外,使用'set'可以使會員資格測試更高效。 – jonrsharpe 2014-12-04 11:59:53

+0

'settings.list'從哪裏來? – 2014-12-04 12:04:51

+0

謝謝。我也修改了我的命名。我的IDE並不高興:) – Richard 2014-12-04 12:19:54

回答

1

我認爲這是什麼,而不是你正在尋找:

if ext in settings.list: 

你需要

ext_list = getattr(settings, list) 
if ext in ext_list: 

編輯: 我同意就行了東西jonrsharpe,所以我改名爲它在我代碼