0
我工作的一個項目刮 - 看什麼recylcing公司在英國刮網站孤男寡女互動
我碰到的與本網站的問題提供不同的產品:
http://www.musicmagpie.co.uk/entertainment/
我有一個條形碼清單,我想找到他們的購買價格(輸入條形碼到搜索框中,點擊「添加按鈕」)。我已經設法讓Selenium Webdriver工作,但這是一個非常緩慢的過程,如果沒有網站出現在我身邊並在某個時候殺死我的流程,我無法運行大量條形碼。
我的目標是每秒約1次搜索,目前平均需要約5秒以上。這是我運行代碼:
driver = webdriver.Chrome(r"C:\Users\leonK\Documents\Python Scripts\chromedriver.exe")
driver.get('http://www.musicmagpie.co.uk/start-selling/basket-media')
countx = 0
count = 0
for EAN in EANs:
countx += 1
count += 1
if count % 200 == 0:
driver.close()
driver = webdriver.Chrome(r"C:\Users\leonK\Documents\Python Scripts\chromedriver.exe")
driver.get('http://www.musicmagpie.co.uk/start-selling/basket-media')
count = 1
driver.find_element_by_xpath("""//*[@id="txtBarcode"]""").send_keys(str(EAN))
#If popup window appears, exception will close it as first click will fail.
try:
driver.find_element_by_xpath("""//*[@id="getValSmall"]""").click()
except:
driver.find_element_by_xpath("""//*[@id="gform_close"]""").click()
prodnames = driver.find_elements_by_xpath("""//div[@class='col_Title']""")
if len(prodnames) == count:
ProductName.append(prodnames[0].text)
BuyPrice.append(driver.find_elements_by_xpath("""//div[@class='col_Price']""")[0].text)
else:
ProductName.append('nan')
BuyPrice.append('nan')
count = len(prodnames)
elapsed = time.clock()
print('MusicMagpieScraper:', EAN, '--', countx, '/', len(EANs), '--', (elapsed - start), 's')
driver.close()
我有使用urllib而與BeautifulSoup解析了一定的經驗,並希望切換到這一點。但是,我不知道如何在沒有webdriver執行點擊操作的情況下提取數據。
任何建議/提示將非常appriciated!
加了:
添加按鈕鏈接是:
__doPostBack('ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$tabbedMediaVal_10$getValSmall','')
這在JS功能我發現:
{name: "__EVENTTARGET", value: ""}
{name: "__EVENTARGUMENT", value: ""}
{name: "__VIEWSTATE", value: "/wEPDwUENTM4MQ9kFgJmD2QWAmYPZBYCZg9kFgJmD2QWBGYPZB…uZSAhaW1wb3J0YW50O2RkQweS+jvDtjK8er7dCKBBRwOWWuE="}
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$signIn_8$hdn_BasketValue", value: "2"}
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$tabbedMediaVal_10$txtBarcode", value: "5051275026429"}
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$tabbedMediaVal_10$wtmBarcode_ClientState", value: ""}
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$tabbedTechVal_11$txtSearch", value: "Enter item (e.g. iPhone 5)"}
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$tabbedTechVal_11$wmSearch_ClientState", value: ""}
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$LegoVal_12$ddlLego", value: "-999"}
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$TotalValueBox_14$txtPromoVoucher_sm", value: ""}
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$TotalValueBox_14$txtPromoVoucher", value: ""}
{name: "__SCROLLPOSITIONX", value: "0"}
{name: "__SCROLLPOSITIONY", value: "0"}
{name: "hiddenInputToUpdateATBuffer_CommonToolkitScripts", value: "1"}
線4是其中所述條形碼是輸入:
{name: "ctl00$ctl00$ctl00$ContentPlaceHolderDefault$mainContent$tabbedMediaVal_10$txtBarcode", value: "5051275026429"}
Hop efully有用的信息,我不知道從哪裏去,此地谷歌並沒有幫助太多
去這些教程,他們會幫助你。 https://www.youtube.com/playlist?list=PLQVvvaa0QuDfV1MIRBOcqClP6VZXsvyZS – babygame0ver
示例條形碼? –
__doPostBack('ctl00 $ ctl00 $ ctl00 $ ContentPlaceHolderDefault $ mainContent $ tabbedMediaVal_10 $ getValSmall','') –