2016-01-18 93 views
0

我想優化這段代碼,截至目前它在10分鐘內運行340次請求。我試圖在30分鐘內獲得1800個請求。根據amazon api的說法,我可以每秒運行一次請求。我可以使用此代碼多線程來增加運行次數嗎?多線程/優化Python請求?

但是,我正在讀取主要函數的完整數據,現在是否應該分割它,我如何計算每個線程應該佔用多少?

def newhmac(): 
    return hmac.new(AWS_SECRET_ACCESS_KEY, digestmod=sha256) 

def getSignedUrl(params): 
    hmac = newhmac() 
    action = 'GET' 
    server = "webservices.amazon.com" 
    path = "/onca/xml" 

    params['Version'] = '2013-08-01' 
    params['AWSAccessKeyId'] = AWS_ACCESS_KEY_ID 
    params['Service'] = 'AWSECommerceService' 
    params['Timestamp'] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()) 

    key_values = [(urllib.quote(k), urllib.quote(v)) for k,v in params.items()] 
    key_values.sort() 
    paramstring = '&'.join(['%s=%s' % (k, v) for k, v in key_values]) 
    urlstring = "http://" + server + path + "?" + \ 
     ('&'.join(['%s=%s' % (k, v) for k, v in key_values])) 
    hmac.update(action + "\n" + server + "\n" + path + "\n" + paramstring) 
    urlstring = urlstring + "&Signature="+\ 
     urllib.quote(base64.encodestring(hmac.digest()).strip()) 
    return urlstring 

def readData(): 
    data = [] 
    with open("ASIN.csv") as f: 
     reader = csv.reader(f) 
     for row in reader: 
      data.append(row[0]) 
    return data 

def writeData(data): 
    with open("data.csv", "a") as f: 
     writer = csv.writer(f) 
     writer.writerows(data) 

def main(): 
    data = readData() 
    filtData = [] 
    i = 0 
    count = 0 
    while(i < len(data) -10): 
     if (count %4 == 0): 
      time.sleep(1) 
     asins = ','.join([data[x] for x in range(i,i+10)]) 
     params = {'ResponseGroup':'OfferFull,Offers', 
       'AssociateTag':'4chin-20', 
       'Operation':'ItemLookup', 
       'IdType':'ASIN', 
       'ItemId':asins} 
     url = getSignedUrl(params) 
     resp = requests.get(url) 
     responseSoup=BeautifulSoup(resp.text) 

     quantity = ['' if product.amount is None else product.amount.text for product in responseSoup.findAll("offersummary")] 
     price = ['' if product.lowestnewprice is None else product.lowestnewprice.formattedprice.text for product in responseSoup.findAll("offersummary")] 
     prime = ['' if product.iseligibleforprime is None else product.iseligibleforprime.text for product in responseSoup("offer")] 


     for zz in zip(asins.split(","), price,quantity,prime): 
      print zz 
      filtData.append(zz) 

     print i, len(filtData) 
     i+=10 
     count +=1 
    writeData(filtData) 


threading.Timer(1.0, main).start() 
+0

您的代碼很慢,因爲您正在一個接一個地同步運行您的請求。你可以設置一個腳本,使用Python 3的asyncio或一個線程處理程序,如下所示:http://stackoverflow.com/a/2635066/2178164 – jumbopap

+0

@jumbopap謝謝,讓我看看,並調整我的代碼,看看會發生什麼。 – Ben

回答

1

如果您正在使用Python 3.2,你可以使用concurrent.futures庫,以方便開展任務多線程。例如在這裏我模擬運行10個URL解析並行工作,每一個溫度需要1秒,如果同步運行,將採取10秒,但與10線程池應採取1秒左右

import time 
from concurrent.futures import ThreadPoolExecutor 

def parse_url(url): 
    time.sleep(1) 
    print(url) 
    return "done." 

st = time.time() 
with ThreadPoolExecutor(max_workers=10) as executor: 
    for i in range(10): 
     future = executor.submit(parse_url, "http://google.com/%s"%i) 

print("total time: %s"%(time.time() - st)) 

輸出:

http://google.com/0 
http://google.com/1 
http://google.com/2 
http://google.com/3 
http://google.com/4 
http://google.com/5 
http://google.com/6 
http://google.com/7 
http://google.com/8 
http://google.com/9 
total time: 1.0066466331481934