2016-12-15 81 views
1

我很多新的多線程,所以我很抱歉,如果它是基本的。我有一些功能,OCR圖像文件,我想多線程的任務。該函數不返回任何內容,但僅保存OCR數據集的文本。代碼如下:Python多處理:Pool.map()似乎根本不會調用函數

start_time = time.time() 
path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test' 
listfiles = os.listdir(path) 

filterfiles = [p for p in listfiles if p[-4:] == '.tif'] 

pool = Pool(processes=2) 

result = pool.map(OCRimage,filterfiles) 

pool.close() 
pool.join() 

print("--- %s seconds ---" % (time.time() - start_time)) 

當我運行的代碼看起來它卡住上pool.map()。我跑了30分鐘,這比試用過程花費的時間要長,並且它不會在單次輸出中產生。我測試了我的功能OCRimage,它似乎並沒有像一次性使用該功能(使用print(1)作爲我的OCRimage代碼的第一行)。我想知道有人能幫助我。謝謝,

卡梅倫

編輯(添加OCRimage功能):

的OCRimage功能如下:

def OCRimage(f): 
    #This runs the magick bash script which splits a multi-image tif into multiple single image tiffs 
    process = subprocess.Popen(["magick", path + "\\" + f, path + "\\temp\\%d.tif"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) 
    print(process.communicate()[0]) 

    #finds the number of pages for each tiff file (this might not be necassary but the all files in directory python command could access files randomly) 
    max1 = -1 
    for filename in os.listdir(path+'\\temp'):  
     if (max1 < int(filename[0:-4])): 
      max1 = int(filename[0:-4]) 
    max1 = max1 + 1 

    text = "" 
    for each in range(0,max1): 
     im = Image.open(path + "\\temp\\"+ str(each) + ".tif") 
     text = text + pytesseract.image_to_string(im) 
    with open(path + "\\result\\OCR-"+f[0:-4]+".txt", 'w') as file: 
     file.write(text)  

    for f in os.listdir(path+'\\temp'): 
     os.remove(path + '\\temp\\' + f) 

EDIT2:這裏是所有進口

import time 
import subprocess 
import os 
import pytesseract 
from PIL import Image 

from multiprocessing import Pool 
import multiprocessing 
countcpus = multiprocessing.cpu_count() 

編輯3:

只運行OCRimage(f)本身工作正常。取而代之的是多線程代碼,我只是用這個:

path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test' 
for p in os.listdir(path): 
    OCRimage(p) 
+0

代替打印到標準輸出嘗試打印到輸出文件:) – alfasin

+0

你是否建議打印到stdo ut出於某種原因不會工作? – cfen

+0

其餘代碼不會將OCR文本文件打印到輸出文件中。 – cfen

回答

0

這是一個Minimal, Complete, and Verifiable Example,這似乎表明,這個問題必須在你的OCRimage功能(見的Windows下面節真正的問題):

from multiprocessing import Pool 

def OCRimage(file_name): 
    print "file_name = %s" % file_name 

filterfiles = ["image%03d.tif" % n for n in range(5)] 

pool = Pool(processes=2) 
result = pool.map(OCRimage, filterfiles) 

pool.close() 
pool.join() 

輸出

file_name = image000.tif 
file_name = image001.tif 
file_name = image002.tif 
file_name = image003.tif 
file_name = image004.tif 

我recomme ND這些變化的OCRimage開始:

def OCRimage(file_name): 
    print "file_name = %s" % file_name 
    src = os.path.join([path, file_name]) 
    dst = os.path.join([path, 'temp', '%d.tif']) 
    command_list = ['magick', src, dst] 
    # This runs the magick bash script which splits a multi-image tif into 
    # multiple single image tiffs 
    process = subprocess.Popen(command_list, 
           shell=True, 
           stdout=subprocess.PIPE, 
           stderr=subprocess.PIPE) 
    output, errors = process.communicate() 
    if process.returncode != 0: 
     print "Image processing failed for %s: %s" % (file_name, errors) 
     return 
    # The rest of your code goes here 

重要的是要驗證從子進程的返回碼是零。如果它不是零,你真的想看看errors字符串。

的Windows

當我運行在Windows上mcve,我得到這個異常:

RuntimeError: 
      Attempt to start a new process before the current process 
      has finished its bootstrapping phase. 

      This probably means that you are on Windows and you have 
      forgotten to use the proper idiom in the main module: 

       if __name__ == '__main__': 
        freeze_support() 
        ... 

      The "freeze_support()" line can be omitted if the program 
      is not going to be frozen to produce a Windows executable. 
Traceback (most recent call last): 
    File "<string>", line 1, in <module> 
    File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main 

當我改變了mcve到這一點,它的工作:

from multiprocessing import Pool 

def OCRimage(file_name): 
    print "file_name = %s" % file_name 

def main(): 
    filterfiles = ["image%03d.tif" % n for n in range(5)] 
    pool = Pool(processes=2) 
    result = pool.map(OCRimage, filterfiles) 
    pool.close() 
    pool.join() 

if __name__ == '__main__': 
    main() 
+0

所以問題是,當我沒有多線程,OCRimage工作正常 – cfen

+0

因此,至少我的問題是'result = pool.map(OCRimage,filterfiles)'不起作用。即使我做'OCRimage(f):返回f ** 2'。我使用python 2.7 – cfen

+0

您是否在我的答案頂部運行[mcve]?它會產生預期的輸出嗎? –

相關問題