我是一個網頁抓取的新手。我試圖從here獲得FASTA文件,但不知何故我不能。首先問題開始爲我span標記,我嘗試了幾個建議,但不爲我工作我懷疑可能有一個隱私問題如何使用BeautifulSoup通過網頁抓取seq標籤數據?
該類中的FASTA文件,但是當我運行此代碼時,我可以看到FASTA標題:
url = "https://www.ncbi.nlm.nih.gov/nuccore/193211599?report=fasta"
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
fasta_data = soup.find_all("div")
for link in soup.find_all("div", {"class": "seqrprt seqviewer"}):
print link.text
url = "https://www.ncbi.nlm.nih.gov/nuccore/193211599?report=fasta"
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
fasta_data = soup.find_all("div")
for link in soup.find_all("div", {"class": "seqrprt seqviewer"}):
print link.text
##When I try to reach directly via span, output is empty.
div = soup.find("div", {'id':'viewercontent1'})
spans = div.find_all('span')
for span in spans:
print span.string