如何增加循環中的python速度？

我有一個存儲在需要集成的Pandas Dataframe中的370k記錄的數據集。我嘗試了多處理，線程，Cpython和循環展開。但我沒有成功，顯示計算的時間是22小時。任務如下：如何增加循環中的python速度？

%matplotlib inline 
from numba import jit, autojit 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 

with open('data/full_text.txt', encoding = "ISO-8859-1") as f: 
strdata=f.readlines() 
data=[] 

for string in strdata: 
data.append(string.split('\t')) 

df=pd.DataFrame(data,columns=["uname","date","UT","lat","long","msg"]) 

df=df.drop('UT',axis=1) 

df[['lat','long']] = df[['lat','long']].apply(pd.to_numeric) 

from textblob import TextBlob 
from tqdm import tqdm 

df['polarity']=np.zeros(len(df))

線程：

from queue import Queue 
from threading import Thread 
import logging 
logging.basicConfig(
level=logging.DEBUG, 
    format='(%(threadName)-10s) %(message)s', 
) 


class DownloadWorker(Thread): 
    def __init__(self, queue): 
     Thread.__init__(self) 
     self.queue = queue 

    def run(self): 
     while True: 
      # Get the work from the queue and expand the tuple 
     lowIndex, highIndex = self.queue.get() 
     a = range(lowIndex,highIndex-1) 
     for i in a: 
      df['polarity'][i]=TextBlob(df['msg'][i]).sentiment.polarity 
     self.queue.task_done() 

    def main(): 
    # Create a queue to communicate with the worker threads 
    queue = Queue() 
    # Create 8 worker threads 
    for x in range(8): 
    worker = DownloadWorker(queue) 
    worker.daemon = True 
    worker.start() 
    # Put the tasks into the queue as a tuple 
    for i in tqdm(range(0,len(df)-1,62936)): 
    logging.debug('Queueing') 
    queue.put((i,i+62936)) 
    queue.join() 
    print('Took {}'.format(time() - ts)) 

main()

多重與循環展開：

pool = multiprocessing.Pool(processes=2) 
r = pool.map(assign_polarity, df) 
pool.close() 

def assign_polarity(df): 
    a=range(0,len(df),5) 
    for i in tqdm(a): 
     df['polarity'][i]=TextBlob(df['msg'][i]).sentiment.polarity 
     df['polarity'][i+1]=TextBlob(df['msg'][i+1]).sentiment.polarity 
     df['polarity'][i+2]=TextBlob(df['msg'][i+2]).sentiment.polarity 
     df['polarity'][i+3]=TextBlob(df['msg'][i+3]).sentiment.polarity 
     df['polarity'][i+4]=TextBlob(df['msg'][i+4]).sentiment.polarity

如何提高計算速度？或以更快的方式將計算存儲在數據幀中？我的筆記本電腦配置

內存：8GB
物理核心：2個
邏輯內核：8級
的Windows 10

實現多重給我更高的計算時間。線程正在順序執行（我認爲是因爲GIL）循環展開給了我相同的計算速度。 Cpython在導入庫時給我提供了錯誤。

來源

2017-05-08 ASD

「所以我嘗試了多處理，線程，Cpython和循環展開。」什麼沒有奏效？你可以在問題中發佈這個問題嗎？ – Boggartfly

您需要提供[MCVE]。 – IanS

@Boggartfly謝謝，我添加了那些不起作用的東西 – ASD

ASD - 我注意到反覆存儲df文件非常慢。我試圖將你的TextBlobs存儲在一個列表（或其他結構）中，然後將該列表轉換爲df的列。

來源

2017-05-08 15:17:04 qbzenker

如何增加循環中的python速度？

回答

相關問題