2013-10-19 66 views
0

我使用beautifulsoup但我不知道該如何正確使用發現,的findAll和其他功能...BeautifulSoup「刮痧」使用他們的名字和他們的ID

如果我有:

<div class="hey"></div> 

使用:soup.find_all("div", class_="hey")

將正確找到div的問題,但是我不知道該怎麼做了以下內容:

<h3 id="me"></h3> # Find this one via "h3" and "id" 

<li id="test1"></li># Find this one via "li" and "id" 

<li custom="test2321"></li># Find this one via "li" and "custom" 

<li id="test1" class="tester"></li> # Find this one via "li" and "class" 

<ul class="here"></ul> # Find this one via "ul" and "class" 

任何想法,將不勝感激:)

回答

1

看看下面的代碼:

from bs4 import BeautifulSoup 

html = """ 
<h3 id="me"></h3> 
<li id="test1"></li> 
<li custom="test2321"></li> 
<li id="test1" class="tester"></li> 
<ul class="here"></ul> 
""" 

soup = BeautifulSoup(html) 

# This tells BS to look at all the h3 tags, and find the ones that have an ID of me 
# This however should not be done because IDs are supposed to be unique, so 
# soup.find_all(id="me") should be used 
one = soup.find_all("h3", {"id": "me"}) 
print one 

# Same as above, if something has an ID, just use the ID 
two = soup.find_all("li", {"id": "test1"}) # ids should be unique 
print two 

# Tells BS to look at all the li tags and find the node with a custom attribute 
three = soup.find_all("li", {"custom": "test2321"}) 
print three 

# Again ID, should have been enough 
four = soup.find_all("li", {"id": "test1", "class": "tester"}) 
print four 

# Look at ul tags, and find the one with a class attribute of "here" 
four = soup.find_all("ul", {"class": "here"}) 
print four 

輸出:

[<h3 id="me"></h3>] 
[<li id="test1"></li>, <li class="tester" id="test1"></li>] 
[<li custom="test2321"></li>] 
[<li class="tester" id="test1"></li>] 
[<ul class="here"></ul>] 

This應提供必要的文件。

0

從幫助:

In [30]: soup.find_all? 
Type:  instancemethod 
String Form: 
<bound method BeautifulSoup.find_all 
File:  /usr/lib/python2.7/site-packages/bs4/element.py 
Definition: soup.find_all(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs) 
Docstring: 
Extracts a list of Tag objects that match the given 
criteria. You can specify the name of the Tag and any 
attributes you want the Tag to have. 

The value of a key-value pair in the 'attrs' map can be a 
string, a list of strings, a regular expression object, or a 
callable that takes a string and returns whether or not the 
string matches for some custom definition of 'matches'. The 
same is true of the tag name. 

所以,你可以將屬性作爲字典,或者就像命名參數:

In [31]: soup.find_all("li", custom="test2321") 
Out[31]: [<li custom="test2321"></li>] 

In [32]: soup.find_all("li", {"id": "test1", "class": ""}) 
Out[32]: [<li id="test1"></li>] 
+0

我只是把'attrs'一切:P。對我來說最簡單的方法:P –