2016-11-20 76 views
1

我有以下Python代碼,即從目錄中的每個文件解析的網址,我嘗試使用功能map實現mulitiprocessing如何在Python中使用地圖來讀取文件?

import glob, os 
import xmltodict 
import mysql.connector 
from multiprocessing import Pool 


def get_xml_paths(folder): 

    return (os.path.join(folder, f) 
      for f in os.listdir(folder) 
      if 'xml' in f) 

def openXML(file): 

    global i 
    doc = xmltodict.parse(file.read()) 
    for i in range(0, len(doc['urlset']['url'])): 

     if i > to: 
      break 

    ## Validation 
     url = doc['urlset']['url'][i]['loc']; 
     if "books" in url: 
      c.execute("INSERT INTO apps (url) VALUES (%s)", [url]) 
      conn.commit() 

    i = i + 1 

if __name__ == '__main__': 

    files = get_xml_paths("unzip/") 

    pool = Pool() 
    pool.map(openXML, files) 
    pool.close() 
    pool.join() 
    c.close() 

所以,當我運行這個程序,我得到錯誤列表:

multiprocessing.pool.RemoteTraceback: 
""" 
Traceback (most recent call last): 
    File "C:\Users\O\AppData\Local\Programs\Python\Python35-32\lib\multiprocessing\pool.py", line 119, in worker 
    result = (True, func(*args, **kwds)) 
    File "C:\Users\O\AppData\Local\Programs\Python\Python35-32\lib\multiprocessing\pool.py", line 44, in mapstar 
    return list(map(*args)) 
    File "C:\Users\O\PycharmProjects\Grabber\grabber.py", line 28, in openXML 
    doc = xmltodict.parse(file.read()) 
AttributeError: 'str' object has no attribute 'read' 

我該如何解決這個問題?我沒有看到明顯的原因。

+1

你試過'打開(文件).read()',因爲你正在返回文件名 - 不是從'get_xml_paths'文件對象? –

+0

看起來你並沒有將任何東西傳遞給你的openXML函數。你不應該有'pool.map(openXML(文件),文件)'?另外我注意到你的openXML函數中沒有return語句。不知道這是否會導致任何問題。您可以用返回來替換break語句。 –

+1

@NoahChristopher,語法很好...... openXML沒問題,因爲「pool.map」的第一個參數是可調用的 - 它的參數取自其餘的參數... –

回答

2

file in openXML是一個字符串而不是文件對象,因此您沒有read-字符串方法。你必須先打開文件:

import glob, os 
import xmltodict 
import mysql.connector 
from multiprocessing import Pool 

def open_xml(file): 
    with open(file) as xml: 
     doc = xmltodict.parse(xml.read()) 
    cursor = conn.cursor() 
    for url in doc['urlset']['url']: 
     url = url['loc']; 
     if "books" in url: 
      cursor.execute("INSERT INTO apps (url) VALUES (%s)", [url]) 
      conn.commit() 

if __name__ == '__main__': 
    files = glob.glob("unzip/*.xml") 
    pool = Pool() 
    pool.map(open_xml, files) 
+0

如何顯示'writeln'作爲日誌分析數據? – Bedouin

+0

如何顯示執行腳本的時間? – Bedouin

+0

如何查看現在是什麼文件被處理? – Bedouin