BeautifulSoup不抓取所有數據

我想抓取一個網站，但是當我運行這段代碼時，它只打印一半的數據（包括評論數據）。這裏是我的腳本：BeautifulSoup不抓取所有數據

from bs4 import BeautifulSoup 
from urllib.request import urlopen 

inputfile = "Chicago.csv" 
f = open(inputfile, "w") 
Headers = "Name, Link\n" 
f.write(Headers) 

url = "https://www.chicagoreader.com/chicago/best-of-chicago-2011-food-drink/BestOf?oid=4106228" 
html = urlopen(url) 
soup = BeautifulSoup(html, "html.parser") 

page_details = soup.find("dl", {"class":"boccat"}) 
Readers = page_details.find_all("a") 

for i in Readers: 
    poll = i.contents[0] 
    link = i['href'] 
    print(poll) 
    print(link) 
    f.write("{}".format(poll) + ",https://www.chicagoreader.com{}".format(link)+ "\n") 
f.close()

是我的腳本風格錯了嗎？
如何縮短代碼？
何時使用find_all和find未獲取屬性錯誤。我閱讀文檔，但不明白。

來源

2017-09-14 Mr.Bones

爲了縮短代碼，可以切換到請求庫。它很容易使用和精確。如果你想使它更短，你可以使用cssselect。

find選擇容器和find_all在for循環中選擇該容器的單個項目。下面是完整的代碼：

from bs4 import BeautifulSoup 
import csv ; import requests 

outfile = open("chicagoreader.csv","w",newline='') 
writer = csv.writer(outfile) 
writer.writerow(["Name","Link"]) 

base = "https://www.chicagoreader.com" 

response = requests.get("https://www.chicagoreader.com/chicago/best-of-chicago-2011-food-drink/BestOf?oid=4106228") 
soup = BeautifulSoup(response.text, "lxml") 
for item in soup.select(".boccat dd a"): 
    writer.writerow([item.text,base + item.get('href')]) 
    print(item.text,base + item.get('href'))

或者使用find和find_all：

from bs4 import BeautifulSoup 
import requests 

base = "https://www.chicagoreader.com" 

response = requests.get("https://www.chicagoreader.com/chicago/best-of-chicago-2011-food-drink/BestOf?oid=4106228") 
soup = BeautifulSoup(response.text, "lxml") 
for items in soup.find("dl",{"class":"boccat"}).find_all("dd"): 
    item = items.find_all("a")[0] 
    print(item.text, base + item.get("href"))

來源

2017-09-14 12:00:42 SIM

嗨沙欣，可以請你提供find_all的和找到一個簡短的例子..？ –

@ Mr.Bones，我已經給出了一個find和find_all的例子。看上面。 – SIM

BeautifulSoup不抓取所有數據

回答

相關問題