我試圖從網上以編程方式收集6000股票的數據,我使用Python 3.6硒webdriver Firefox。 [我打算使用BeautifulSoup來解析HTML,但它似乎每當我更新網頁,鏈接不會改變,湯沒有處理Javascript]python webscraping在循環中失敗,但工程時,我手動執行
無論如何,當我創建一個for循環做到這一點,我的代碼中的特定行share_price = driver.find_element_by_css_selector(".highcharts-root > g:nth-child(25) > text:nth-child(2)")
大部分時間都出錯了(雖然它工作了幾次,所以我相信我的代碼很好)。但是,如果我手動執行它(它可以複製並粘貼到Python IDLE並運行它),它就可以正常工作。我試圖使用time.sleep(4)
來允許在我從後臺打撈任何東西之前加載網頁,但似乎這不是解決方案。現在我已經沒有任何提示。任何人都可以幫我解開這個問題嗎?
下面是我的代碼:
from selenium import webdriver
import time
import pyautogui
filename = "historical_price_marketcap.csv"
f = open(filename,"w")
headers = "stock_ticker, share_price, market_cap\n"
f.write(headers)
driver = webdriver.Firefox()
def get_web():
driver.get("https://stockrow.com")
import csv
with open("TICKER.csv") as file:
read = csv.reader(file)
TICKER=[]
for row in read:
ticker = row[0][1:-1]
TICKER.append(ticker)
for Ticker in range(len(TICKER)):
get_web()
time.sleep(3)
pyautogui.click(425, 337)
pyautogui.typewrite(TICKER[Ticker],0.25)
time.sleep(2)
pyautogui.press("enter")
time.sleep(2)
pyautogui.click(268, 337)
pyautogui.press("backspace")
time.sleep(2)
pyautogui.typewrite('Stock Price',0.25)
time.sleep(2)
pyautogui.press("enter")
time.sleep(2)
pyautogui.click(702, 427)
for i in range(int(10)):
pyautogui.press("backspace")
time.sleep(2)
pyautogui.typewrite("2013-12-01",0.25)
pyautogui.press("enter")
time.sleep(2)
pyautogui.click(882, 425)
for k in range(10):
pyautogui.press("backspace")
time.sleep(2)
pyautogui.typewrite("2013-12-31",0.25)
pyautogui.press("enter")
time.sleep(2)
pyautogui.click(1317, 318)
for j in range(3):
pyautogui.press("down")
time.sleep(10)
share_price = driver.find_element_by_css_selector(".highcharts-root > g:nth-child(25) > text:nth-child(2)")
get_web()
time.sleep(3)
pyautogui.click(425, 337)
pyautogui.typewrite(TICKER[Ticker],0.25)
time.sleep(2)
pyautogui.press("enter")
time.sleep(2)
pyautogui.click(268, 337)
pyautogui.press("backspace")
time.sleep(2)
pyautogui.typewrite('Market Cap',0.25)
time.sleep(2)
pyautogui.press("enter")
time.sleep(2)
pyautogui.click(702, 427)
for i in range(int(10)):
pyautogui.press("backspace")
time.sleep(2)
pyautogui.typewrite("2013-12-01",0.25)
pyautogui.press("enter")
time.sleep(2)
pyautogui.click(882, 425)
for k in range(10):
pyautogui.press("backspace")
time.sleep(2)
pyautogui.typewrite("2013-12-31",0.25)
pyautogui.press("enter")
time.sleep(2)
pyautogui.click(1317, 318)
for j in range(3):
pyautogui.press("down")
time.sleep(10)
market_cap = driver.find_element_by_css_selector(".highcharts-root > g:nth-child(28) > text:nth-child(2)")
f.close()
似乎兩行是竊聽我是share_price = driver.find_element_by_css_selector(".highcharts-root > g:nth-child(25) > text:nth-child(2)")
這裏是Python的錯誤消息:
Traceback (most recent call last):
File "C:\Users\HENGBIN\Desktop\get_historical_data.py", line 65, in <module>
share_price = driver.find_element_by_css_selector(".highcharts-root > g:nth-child(25) > text:nth-child(2)")
File "E:\Program Files\python3.6.1\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 457, in find_element_by_css_selector
return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
File "E:\Program Files\python3.6.1\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 791, in find_element
'value': value})['value']
File "E:\Program Files\python3.6.1\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 256, in execute
self.error_handler.check_response(response)
File "E:\Program Files\python3.6.1\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: .highcharts-root > g:nth-child(25) > text:nth-child(2)
它不工作的大部分在循環中的時間,但工作正常,如果我在Python IDLE中手動運行它。我不知道發生了什麼.........
beautifulsoup可能不是一個很好的選擇,因爲網絡正在使用JavaScript。 –