我想從鏈接中抓取一些數據:http://www.airlinequality.com/airline-reviews/vietjetair/?sortby=post_date%3ADesc&pagesize=100 例如我正在用BeautifulSoup來提取每個審閱者的名字,但它不起作用。我曾嘗試過使用BeautifulSoup與其他網站,它完美的工作!我不知道發生了什麼。你可以幫我嗎。代碼如下:用Beautifulsoup-Python進行破口
from bs4 import BeautifulSoup
import os
import urllib.request
file1 = open(os.path.expanduser(r"~/Desktop/Skytrax Reviews1.csv"), "wb")
file1.write(b"Reviewer" + b"\n")
WebSites = ["http://www.airlinequality.com/airline-reviews/vietjetair/?sortby=post_date%3ADesc&pagesize=100"]
# looping through each site until it hits a break. I will create a loop. It is not ready yet
for theurl in WebSites:
thepage = urllib.request.urlopen(theurl)
print(thepage)
soup = BeautifulSoup(thepage,'lxml')
print(soup) #<-------This is the main problem
#Maybe it is not correct too but the main problem is at the above lines
for Reviewer in soup.findAll(attrs={"class": "text_sub_header userStatusWrapper"}).text:
print(Reviewer)
Record1 = Reviewer
file1.write(bytes(Record1, encoding="ascii", errors='ignore') + b"\n")
file1.close()
@Rusa_x謝謝您的回答。我是新的python我使用相同的鏈接作爲你的。 –