我想要獲取鏈接的URL,以便在特定時間段內從Yahoo Finance下載資產的歷史數據。 1999年1月1日至今。BeautifulSoup的HTML不見了
因此,舉例來說,如果我去這裏: https://finance.yahoo.com/quote/XLB/history?period1=915177600&period2=1498633200&interval=1d&filter=history&frequency=1d
我想獲得這種(從數據表上方的「下載數據」鏈接):
"https://query1.finance.yahoo.com/v7/finance/download/XLB?period1=915177600&period2=1498633200&interval=1d&events=history&crumb=iX6bJ6LfGxc"
我使用BeautifulSoup和我遇到了所需標籤的問題,它保留了href不在html中顯示。起初,我認爲BeautifulSoup在沒有嘗試使用find_all('a')並遍歷子/後代沒有結果後工作不正常。但是當我做了html的文本轉儲時,html元素(以及父元素中的所有內容)不在那裏。 有人可以解釋發生了什麼事嗎?下面列出了我目前的工作。
from bs4 import BeautifulSoup
import datetime as dTime
import requests
"""
asset = "Materials"
assetSignal = "XLB"
today = dTime.datetime.now()
startTime = str(int(dTime.datetime(1999, 1, 1, 0, 0, 0).timestamp()))
endTime = str(int(dTime.datetime(today.year, today.month, today.day, 0, 0, 0).timestamp()))
url = "https://finance.yahoo.com/quote/" + assetSignal + "/history?period1=" + startTime + "&period2=" + endTime + "&interval=1d&filter=history&frequency=1d"
"""
url = "https://finance.yahoo.com/quote/XLB/history?period1=915177600&period2=1498633200&interval=1d&filter=history&frequency=1d"
page = requests.get(url)
data = page.content
#soup = BeautifulSoup(data, "html.parser")
soup = BeautifulSoup(data, "lxml")
#soup = BeautifulSoup(data, "xml")
#soup = BeautifulSoup(data, "html5lib")
#Link not found
for link in soup.find_all("a"):
print(link.get("href"))
#Span is empty?
span = soup.find(class_="Fl(end) Pos(r) T(-6px)")
print(span)
print(span.string)
print(span.contents)
for child in span.children:
print(child)
#Other span has children. Target span doesn't
div = soup.find(class_="C($finDarkGray) Mt(20px) Mb(15px)")
print(div)
for child in div.descendants:
print(child)
#Is the tag even there?
with open("soup.txt", "w") as file:
file.write(page.text)
此代碼是否運行?導致'url = https://finance.yahoo.com/quote/XLB/history?period1 = 915177600&period2 = 1498633200&interval = 1d&filter = history&frequency = 1d'對我來說很腥。 – patrick
的代碼工作,只需將該URL放在引號中,但確實下載鏈接在湯的結果中不可用。它看起來像鏈接是JavaScript,並且BeautifulSoup不執行Javascript,因此如果您使用BeautifulSoup刮取任何通過JS傳遞或呈現的數據,您將無法使用它。可能需要查看硒或幻燈片 – davedwards