2017-01-25 52 views
0

我試圖得到列表中的所有元素從一個網站蟒蛇,LXML檢索所有元素的列表

從以下HTML片段:

<ul> 
    <li class="name"> James </li> 
    <li> Male </li> 
    <li> 5'8" </li> 
</ul> 

我當前的代碼需要使用XPath並將名稱存儲在列表中。有沒有辦法將所有三個字段都列爲一個列表?

我的代碼:

name = tree.xpath('//li[@class="name"]/text()') 

回答

1
import lxml.html as LH 
tree = LH.parse('data') 
print(tree.xpath('//li[../li[@class="name" and position()=1]]/text()')) 

打印

[' James ', ' Male ', ' 5\'8" '] 

中的XPath '//li[../li[@class="name" and position()=1]]/text()'意味着

//li    # all li elements 
[    # whose 
..    # parent 
/    # has a child 
li    # li element 
    [    # whose 
    @class="name" # class attribute equals "name" 
    and   # and 
    position()=1] # which is the first child element 
    ]    
    /text()  # return the text of those elements 
1
from lxml import html 

text = '''<ul> 
    <li class="name"> James </li> 
    <li> Male </li> 
    <li> 5'8" </li> 
</ul> 
<ul> 
    <li class="name"> James </li> 
    <li> Male </li> 
    <li> 5'8" </li> 
</ul> 
<ul> 
    <li class="name"> James </li> 
    <li> Male </li> 
    <li> 5'8" </li> 
</ul>''' 

tree = html.fromstring(text) 
for ul in tree.xpath('//ul[li[@class="name"]]'): # loop through the ul tag, whose child tag contains class attribute and the value is 'name' 
    print(ul.xpath("li/text()")) # get all the text in the li tag 

出來:

[' James ', ' Male ', ' 5\'8" '] 
[' James ', ' Male ', ' 5\'8" '] 
[' James ', ' Male ', ' 5\'8" ']