Python的Beautifulsoup解析HTML

我試圖解析Python的Beautifulsoup解析HTML

<td height="16" class="listtable_1"><a href="http://steamcommunity.com/profiles/76561198134729239" target="_blank">76561198134729239</a></td>

的76561198134729239.，我無法弄清楚如何做到這一點。我的嘗試：

import requests 
from lxml import html 
from bs4 import BeautifulSoup 
r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154") 
content = r.content 
soup = BeautifulSoup(content, "html.parser") 
element = soup.find("td", 
{ 
    "class":"listtable_1", 
    "target":"_blank" 
}) 
print(element.text)

來源

2017-01-18 nooby

有很多這樣的條目在該HTML中。爲了讓所有的人都可以使用下列內容：

import requests 
from lxml import html 
from bs4 import BeautifulSoup 

r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154") 
soup = BeautifulSoup(r.content, "html.parser") 

for td in soup.findAll("td", class_="listtable_1"): 
    for a in td.findAll("a", href=True, target="_blank"): 
     print(a.text)

這則返回：

76561198143466239 
76561198094114508 
76561198053422590 
76561198066478249 
76561198107353289 
76561198043513442 
76561198128253254 
76561198134729239 
76561198003749039 
76561198091968935 
76561198071376804 
76561198068375438 
76561198039625269 
76561198135115106 
76561198096243060 
76561198067255227 
76561198036439360 
76561198026089333 
76561198126749681 
76561198008927797 
76561198091421170 
76561198122328638 
76561198104586244 
76561198056032796 
76561198059683068 
76561197995961306 
76561198102013044

來源

2017-01-18 13:52:15

"target":"_blank"是td標籤中的類錨標記a的。這不是一類td標記。

你可以得到它，像這樣：

from bs4 import BeautifulSoup 

html=""" 
<td height="16" class="listtable_1"> 
    <a href="http://steamcommunity.com/profiles/76561198134729239" target="_blank"> 
     76561198134729239 
    </a> 
</td>""" 

soup = BeautifulSoup(html, 'html.parser') 

print(soup.find('td', {'class': "listtable_1"}).find('a', {"target":"_blank"}).text)

輸出：

76561198134729239

來源

2017-01-18 13:43:19 MYGz

"class":"listtable_1"屬於td標籤和target="_blank"屬於a標籤，你不應該使用它們在一起。

您應該使用Steam Community作爲查找其後的數字的錨點。

，或者使用網址，該網址包含您需要的信息，很容易找到，你可以找到的網址，並通過/把它分解：

for a in soup.find_all('a', href=re.compile(r'steamcommunity')): 
    num = a['href'].split('/')[-1] 
    print(num)

代碼：

import requests 
from lxml import html 
from bs4 import BeautifulSoup 
r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154") 
content = r.content 
soup = BeautifulSoup(content, "html.parser") 
for td in soup.find_all('td', string="Steam Community"): 
    num = td.find_next_sibling('td').text 
    print(num)

出：

76561198143466239 
76561198094114508 
76561198053422590 
76561198066478249 
76561198107353289 
76561198043513442 
76561198128253254 
76561198134729239 
76561198003749039 
76561198091968935 
76561198071376804 
76561198068375438 
76561198039625269 
76561198135115106 
76561198096243060 
76561198067255227 
76561198036439360 
76561198026089333 
76561198126749681 
76561198008927797 
76561198091421170 
76561198122328638 
76561198104586244 
76561198056032796 
76561198059683068 
76561197995961306 
76561198102013044

來源

2017-01-18 13:44:25

正如別人提到你ar e嘗試檢查單個find()中不同元素的屬性。相反，你可以鏈find()電話爲MYGz建議，或使用一個CSS selector：

soup.select_one("td.listtable_1 a[target=_blank]").get_text()

如果您需要找到多個元素這種方式，使用select()：

for elm in soup.select("td.listtable_1 a[target=_blank]"): 
    print(elm.get_text())

來源

2017-01-18 13:51:47 alecxe

Python的Beautifulsoup解析HTML

回答

相關問題