0
我已經爲shopify網站構建了一個結帳URL。這是通過在結帳URL中追加每個唯一產品的「變體」ID然後在網頁瀏覽器中打開該URL來完成的。要找到變體ID,我需要解析網站的網站地圖以獲取ID,我正在爲每個我正在解析的產品分別執行不同的線程,但是每個線程都增加了相當多的時間(幾乎一個第二)。線程仍然需要很長時間
爲什麼會出現這種情況?難道它不應該在幾乎相同的時間,因爲每個線程基本上做同樣的事情?
作爲參考,一個線程需要大約2.0秒,兩個線程2.8s和周圍3.8S
以下三個線程是我的代碼:
import time
import requests
from bs4 import BeautifulSoup
import webbrowser
import threading
sitemap2 = 'https://deadstock.ca/sitemap_products_1.xml'
atc_url = 'https://deadstock.ca/cart/'
# CHANGE SITEMAP TO THE CORRECT ONE (THE SITE YOU ARE SCRAPING)
variant_list = []
def add_to_cart(keywords, size):
init = time.time()
# Initialize session
product_url = ''
parse_session = requests.Session()
response = parse_session.get(sitemap2)
soup = BeautifulSoup(response.content, 'lxml')
variant_id = 0
# Find Item
for urls in soup.find_all('url'):
for images in urls.find_all('image:image'):
if all(i in images.find('image:title').text.lower() for i in keywords):
now = time.time()
product_name = images.find('image:title').text
print('FOUND: ' + product_name + ' - ' + str(format(now-init, '.3g')) + 's')
product_url = urls.find("loc").text
if product_url != '':
response1 = parse_session.get(product_url+".xml")
soup = BeautifulSoup(response1.content,'lxml')
for variants in soup.find_all('variant'):
if size in variants.find('title').text.lower():
variant_id = variants.find('id', type='integer').text
atc_link = str(variant_id)+':1'
print(atc_link)
variant_list.append(atc_link)
try:
print("PARSED PRODUCT: " + product_name)
except UnboundLocalError:
print("Retrying")
add_to_cart(keywords, size)
def open_checkout():
url = 'https://deadstock.ca/cart/'
for var in variant_list:
url = url + var + ','
webbrowser.open_new_tab(url)
# When initializing a new thread, only change the keywords in the args, and make sure you start and join the thread.
# Change sitemap in scraper.py to your websites' sitemap
# If the script finds multiple items, the first item will be opened so please try to be very specific yet accurate.
def main():
print("Starting Script")
init = time.time()
try:
t1 = threading.Thread(target=add_to_cart, args=(['alltimers','relations','t-shirt','white'],'s',))
t2 = threading.Thread(target=add_to_cart, args=(['alltimers', 'relations', 'maroon'],'s',))
t3 = threading.Thread(target=add_to_cart, args=(['brain', 'dead','melter'], 's',))
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
print(variant_list)
open_checkout()
except:
print("Product not found/not yet live. Retrying..")
main()
print("Time taken: " + str(time.time()-init))
if __name__ == '__main__':
main()
爲您的代碼正確縮進代碼 – donkopotamus
@donkopotamus apologies - fixed – JC1
首先關於多線程,當使用多個線程(創建時間,上下文切換)時,您將總是會得到一個開銷,這將不會像完成同一時間線性情況。其次,注意在你的代碼中,你計算的時間還包括'open_checkout'調用,它幾乎不能被認爲是一個常量(除非你的帶寬總是**不變)。最後,由於GIL,Python的實現與另一種語言相比,大大減緩了多線程實現的速度(請參閱https://docs.python.org/2/glossary.html#term-global-interpreterlock) – Adonis