我已經嘗試了很多次,但它不工作:此代碼爲什麼只下載一個頁面的數據?
import requests
from lxml import html, etree
from selenium import webdriver
import time, json
#how many page do you want to scan
page_numnotint = input("how many page do you want to scan")
page_num = int(page_numnotint)
file_name = 'jd_goods_data.json'
url = 'https://list.jd.com/list.html?cat=1713,3264,3414&page=1&delivery=1&sort=sort_totalsales15_desc&trans=1&JL=4_10_0#J_main'
driver = webdriver.Chrome()
driver.get(url)
base_html = driver.page_source
selctor = etree.HTML(base_html)
date_info = []
name_data, price_data = [], []
jd_goods_data = {}
for q in range(page_num):
i = int(1)
while True:
name_string = '//*[@id="plist"]/ul/li[%d]/div/div[3]/a/em/text()' %(i)
price_string = '//*[@id="plist"]/ul/li[%d]/div/div[2]/strong[1]/i/text()' %(i)
if i == 60:
break
else:
i += 1
name = selctor.xpath(name_string)[0]
name_data.append(name)
price = selctor.xpath(price_string)[0]
price_data.append(price)
jd_goods_data[name] = price
print(name_data)
with open(file_name, 'w') as f:
json.dump(jd_goods_data, f)
time.sleep(2)
driver.find_element_by_xpath('//*[@id="J_bottomPage"]/span[1]/a[10]').click()
time.sleep(2)
# for k, v in jd_goods_data.items():
# print(k,v)
我想下載一些細節,但它不工作。如果您輸入2進行掃描,它只會下載一個頁面的詳細信息,但會下載兩次!
在使用你的變量'q'(在一個範圍內(在PAGE_NUM'分配給Q):'我想你設置它通過'輸入'函數到'2',但是如果你想從第二頁加載細節,你將不得不讓你的腳本成爲這個'q'的函數。 – Kanak
我使用了一個變量'q'使範圍工作,然後使循環工作 –