0
我想從設置文件中動態加載列表/元組。Python從設置文件動態加載元組/列表
我需要編寫一個抓取網站的抓取工具,但我想知道找到的文件,而不是網頁。
我允許用戶在settings.py
文件中指定這樣的文件類型,如:
# Document Types during crawling
textFiles = ['.doc', '.docx', '.log', '.msg', '.pages', '.rtf', '.txt', '.wpd', '.wps']
dataFiles = ['.csv', '.dat', '.efx', '.gbr', '.key', '.pps', '.ppt', '.pptx', '.sdf', '.tax2010', '.vcf', '.xml']
audioFiles = ['.3g2','.3gp','.asf','.asx','.avi','.flv','.mov','.mp4','.mpg','.rm','.swf','.vob','.wmv']
#What lists would you like to use ?
fileLists = ['textFiles', 'dataFiles', 'audioFiles']
導入我的設置文件中的crawler.py
我用beautifulsoup
模塊找到從HTML內容鏈接並且過程如下:
for item in soup.find_all("a"):
# we dont want some of them because it is just a link to the current page or the startpage
if item['href'] in dontWantList:
continue
#check if link is a file based on the fileLists from the settings
urlpath = urlparse.urlparse(item['href']).path
ext = os.path.splitext(urlpath)[1]
file = False
for list in settings.fileLists:
if ext in settings.list:
file = True
#found file link
if self.verbose:
messenger("Found a file of type: %s" % ext, Colors.PURPLE)
if ext not in fileLinks:
fileLinks.append(item['href'])
#Only add the link if it is not a file
if file is not True:
links.append(item['href'])
else:
#Do not add the file to the other lists
continue
以下代碼段引發錯誤:
for list in settings.fileLists:
if ext in settings.list:
顯然是因爲python認爲settings.list是一個列表。
有無論如何告訴python從設置文件中動態查找列表嗎?
不要命名你自己的變量'list',你影響內置。此外,使用'set'可以使會員資格測試更高效。 – jonrsharpe 2014-12-04 11:59:53
'settings.list'從哪裏來? – 2014-12-04 12:04:51
謝謝。我也修改了我的命名。我的IDE並不高興:) – Richard 2014-12-04 12:19:54