我試圖從this站點使用數據並保存到數據庫中。當我使用螢火蟲查看該網站時,表格行格式良好。但我的下面的代碼得到錯誤的html
內容。python在使用時請求錯誤的html函數獲取函數
from bs4 import BeautifulSoup
import requests, urllib2
from peewee import SqliteDatabase,CharField,Model
db = SqliteDatabase("cybercrime.db")
class CyberCrimeList(Model):
date = CharField()
url = CharField()
ip = CharField()
type = CharField()
class Meta:
database = db
url = "http://cybercrime-tracker.net/index.php?m=4"
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
table = soup.find('table',attrs={'class':'ExploitTable'})
print table.tbody
但代碼只給出第一行格式不正確。我得到</tr></td>
而不是</td></tr>
。
有什麼我在誤解?我的代碼有什麼問題?
<tr><td>23-11-2015</td>
<td>jda3.byethost3.com/panel/index.php?login</td>
<td><a href="https://www.virustotal.com/en/ip-address/185.27.134.160/information/" target="_blank">185.27.134.160</a></td>
<td>Solar</td>
<td><a href="https://www.virustotal.com/latest-scan/http://jda3.byethost3.com/panel/index.php?login" target="_blank"><img alt="Scan with VirusTotal" border="0" height="12" longdesc="Scan with VirusTotal" src="vt.png" width="13"/></a> <a href="http://cybercrime-tracker.net/index.php?s=0&m=40&search=Solar"><img alt="Search the family" border="0" height="12" longdesc="Search the family" src="vwicn008.gif" width="13"/></a></td></tr>
所以你要得到什麼? –
我將用日期,網址,IP和類型的值填充數據庫。但現在,代碼只給出一行而不是四行。 – Pant