我有以下Python代碼,即從目錄中的每個文件解析的網址,我嘗試使用功能map
實現mulitiprocessing如何在Python中使用地圖來讀取文件?
import glob, os
import xmltodict
import mysql.connector
from multiprocessing import Pool
def get_xml_paths(folder):
return (os.path.join(folder, f)
for f in os.listdir(folder)
if 'xml' in f)
def openXML(file):
global i
doc = xmltodict.parse(file.read())
for i in range(0, len(doc['urlset']['url'])):
if i > to:
break
## Validation
url = doc['urlset']['url'][i]['loc'];
if "books" in url:
c.execute("INSERT INTO apps (url) VALUES (%s)", [url])
conn.commit()
i = i + 1
if __name__ == '__main__':
files = get_xml_paths("unzip/")
pool = Pool()
pool.map(openXML, files)
pool.close()
pool.join()
c.close()
所以,當我運行這個程序,我得到錯誤列表:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\O\AppData\Local\Programs\Python\Python35-32\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "C:\Users\O\AppData\Local\Programs\Python\Python35-32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "C:\Users\O\PycharmProjects\Grabber\grabber.py", line 28, in openXML
doc = xmltodict.parse(file.read())
AttributeError: 'str' object has no attribute 'read'
我該如何解決這個問題?我沒有看到明顯的原因。
你試過'打開(文件).read()',因爲你正在返回文件名 - 不是從'get_xml_paths'文件對象? –
看起來你並沒有將任何東西傳遞給你的openXML函數。你不應該有'pool.map(openXML(文件),文件)'?另外我注意到你的openXML函數中沒有return語句。不知道這是否會導致任何問題。您可以用返回來替換break語句。 –
@NoahChristopher,語法很好...... openXML沒問題,因爲「pool.map」的第一個參數是可調用的 - 它的參數取自其餘的參數... –