我很多新的多線程,所以我很抱歉,如果它是基本的。我有一些功能,OCR圖像文件,我想多線程的任務。該函數不返回任何內容,但僅保存OCR數據集的文本。代碼如下:Python多處理:Pool.map()似乎根本不會調用函數
start_time = time.time()
path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
listfiles = os.listdir(path)
filterfiles = [p for p in listfiles if p[-4:] == '.tif']
pool = Pool(processes=2)
result = pool.map(OCRimage,filterfiles)
pool.close()
pool.join()
print("--- %s seconds ---" % (time.time() - start_time))
當我運行的代碼看起來它卡住上pool.map()
。我跑了30分鐘,這比試用過程花費的時間要長,並且它不會在單次輸出中產生。我測試了我的功能OCRimage,它似乎並沒有像一次性使用該功能(使用print(1)
作爲我的OCRimage代碼的第一行)。我想知道有人能幫助我。謝謝,
卡梅倫
編輯(添加OCRimage功能):
的OCRimage功能如下:
def OCRimage(f):
#This runs the magick bash script which splits a multi-image tif into multiple single image tiffs
process = subprocess.Popen(["magick", path + "\\" + f, path + "\\temp\\%d.tif"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
print(process.communicate()[0])
#finds the number of pages for each tiff file (this might not be necassary but the all files in directory python command could access files randomly)
max1 = -1
for filename in os.listdir(path+'\\temp'):
if (max1 < int(filename[0:-4])):
max1 = int(filename[0:-4])
max1 = max1 + 1
text = ""
for each in range(0,max1):
im = Image.open(path + "\\temp\\"+ str(each) + ".tif")
text = text + pytesseract.image_to_string(im)
with open(path + "\\result\\OCR-"+f[0:-4]+".txt", 'w') as file:
file.write(text)
for f in os.listdir(path+'\\temp'):
os.remove(path + '\\temp\\' + f)
EDIT2:這裏是所有進口
import time
import subprocess
import os
import pytesseract
from PIL import Image
from multiprocessing import Pool
import multiprocessing
countcpus = multiprocessing.cpu_count()
編輯3:
只運行OCRimage(f)本身工作正常。取而代之的是多線程代碼,我只是用這個:
path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
for p in os.listdir(path):
OCRimage(p)
代替打印到標準輸出嘗試打印到輸出文件:) – alfasin
你是否建議打印到stdo ut出於某種原因不會工作? – cfen
其餘代碼不會將OCR文本文件打印到輸出文件中。 – cfen