您解析的是html
,但您使用了xml
解析器。
您應該使用soup=BeautifulSoup(data,"html.parser")
你需要的數據是在script
標籤,其實是沒有table
標籤實際。因此,您需要在script
內找到文本。
N.B:如果您使用Python 2.x,則使用「HTMLParser」而不是「html.parser」。
這是代碼。
import csv
import requests
from bs4 import BeautifulSoup
url = "http://www.payscale.com/college-salary-report/bachelors?page=65"
r=requests.get(url)
data=r.text
soup=BeautifulSoup(data,"html.parser")
scripts = soup.find_all("script")
file_name = open("table.csv","w",newline="")
writer = csv.writer(file_name)
list_to_write = []
list_to_write.append(["Rank","School Name","School Type","Early Career Median Pay","Mid-Career Median Pay","% High Job Meaning","% STEM"])
for script in scripts:
text = script.text
start = 0
end = 0
if(len(text) > 10000):
while(start > -1):
start = text.find('"School Name":"',start)
if(start == -1):
break
start += len('"School Name":"')
end = text.find('"',start)
school_name = text[start:end]
start = text.find('"Early Career Median Pay":"',start)
start += len('"Early Career Median Pay":"')
end = text.find('"',start)
early_pay = text[start:end]
start = text.find('"Mid-Career Median Pay":"',start)
start += len('"Mid-Career Median Pay":"')
end = text.find('"',start)
mid_pay = text[start:end]
start = text.find('"Rank":"',start)
start += len('"Rank":"')
end = text.find('"',start)
rank = text[start:end]
start = text.find('"% High Job Meaning":"',start)
start += len('"% High Job Meaning":"')
end = text.find('"',start)
high_job = text[start:end]
start = text.find('"School Type":"',start)
start += len('"School Type":"')
end = text.find('"',start)
school_type = text[start:end]
start = text.find('"% STEM":"',start)
start += len('"% STEM":"')
end = text.find('"',start)
stem = text[start:end]
list_to_write.append([rank,school_name,school_type,early_pay,mid_pay,high_job,stem])
writer.writerows(list_to_write)
file_name.close()
這將在csv中生成您的必要表格。完成後不要忘記關閉文件。
我將打印'data',看看如果你發現頁面中的表。 – metame
謝謝@metama。我這樣做 - 唯一的 - <! - Tablet Image - > 和bsoup wouldnt找到它.. 另外 - 如果尋找表不是這種情況下,那麼你會怎麼去它?謝謝! – oba2311
頁面中沒有表格標籤。所有表格信息都在腳本標籤中。 –